DOCUMENT RESUME 



ED 257 159 
AUTHOR 
TITLE < 



v INSTITUTION 
SPONS AGENCY 



REPORT NO 
PUB DATE 
CONTRACT 
GRANT 



NOTE , 

AVAILABLE FROM 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



ABSTRACT 



CS 504 934 

Studdert-Kennedy, Michael, Ed.; O'Brien, .Nancy, 
Ed • * • *.•»•' 

Statue Report on Speech Research: A Report on the • 
Status and Progress of Studies on the Nature of. • 
Speech, Instrumentation for iti' Investigation, , and 
Practical Applications > Jai \ary 1 -March 31, 1985. 
Haskins Labs., New He van, Conn. 

National Institutes of Health (DHHS) , Bethesda, Md;; 
National Inst, of Child Health and Human Development 
(NIH), Bethesda, Nd.; National Inst, of Neurological 
and Communicative Disorders and Stroke (NIH) , 
Bethesda, Md.; National Science Foundation,. 
Washington, D.C.; Office of Naval Research, 
Washington, D.C. V 
SR-8i-(1985) 
85 

NICHtiD-NOl-HD-5-2910; ONR-N00014-83-R-0083 
NICHHD-HD-01994; NICHHD-HD-16591; NIBBRS-RR-05596; 
NINCDS-NS-13617; NINCDS-NS'-13870; NINCDS-NS-18010; 
NSF-BNS-8111470 
321p. * ' 

U.S. Department of Commerce; National Technical 
Information Service, 5285T Port Royal Road, 
Springfield, Virginia 22151. 

Reports - Research/Technical (143)- — Information 



Analyses (070) 
MF01/PC13 plus Postage. 

Academic 'Aptitude; Adults; ^Articulation (Speech); 
Cognitive Processes; ♦Communication Research; 
Elementary Secondary Education; *Lahguage Processing; 
♦Language Research; Linguistic Theory; Memory; 
Perception; Reading Difficulties;" *Speech 
Communication; Speech Handicaps; Speech Improvement; 
♦Speech Skills; Spelling ! v 



One of a regular series 'on the status and progress of 
studies on the nature of speech, instrumentation for its v 
investigation, and practical applications, this report covers the 
period January 1 to March 31, 1985. Studies summarized in the report 
cover such topics as (X*) segmentation of coart iculated speech in 
perception (2) intrinsic time in speech production, (3) a theoretical 
model of phase transitions in hand movements, (4) repetitive naming" 
and the detection of word retrieval deficits in the beginning reader, 



("5) linguistic abilities and spelling proficiency in kindergarten 
children and adult. poor spellers, (6) errors in short term memory for 
good, and poor readers', (7) longitudinal prediction and prevention of 



early reading difficulty, (8) temporary memory for linguistic an£ 
nonlinguistic material in relation to the acquisition of Japanese 
Kan a and Xanji, (9) speech perception and the prelinguistic infant, 
(10) categorical trends in vowel imitation, (11) cognitive processes 
in reading, and (12) processing kinematic data. (HOD) 



/ 



w /• 



v e 



WITJTUTt 6r EDUCATION 
CpQCATlONAi HiSOUUCWWrOftMATION 
CENTER (ERIC) 
* ■ in* documtnt hat been reproducad m 
/ r\ecatved from the person or organization 
Odoinitino ft. 
D Minor chfrg* hM been made to improve 
reproduction quality. 



\ 



SR-81U985) 



. e> Point* of view or opinions stated In this docu- 
ment do not necessarily repreeem official NIE 
poefflor*orpo4jey/ " # 

Status Report on 



SPEECH RESEARCH 



A Report on 
the Status and Progress of Studies on 
the Nature of Speech, Instrumentation 
'for Its Investigation, and Practical 
Applications 



1 January - 31 March 1985 



■ ■ V : 



. 4 



Haskins Laboratories 
270 Grown .Street 
New Haven, Conn. 0651 1 



A 



DISTRIBUTION OF THIS DOCUMENT IS UNLIMITED 



(The information in this docurfient is available to the general 
public. Haskins Laboratories distributes it primarily for library 
use. Copies are available from the National Technical Information 
Service or the ERIC Document Reproduction Service. See the 
Appendix for order numbers of previous Status Reports.) 



2 



9 

ERIC 



■ ;» 



Michael Studdert-Kennedy, Editor-in-Chief 



Nancy O'Brien, Editor 
Margo Carter, Technical Illustrator . 
Gail Reynolds, technical Coordinator 



\ - 



V 



4 



* * SR-81 (1985) 
January-March 



ACKNOWLEDGMENTS * 

g 4 

The research reported here was made possible 
in part by support from the following sources: 

National Institute o^. Child and Health and Human Development 

Grant HD-01994 

Grant HD-16591 ' % 



National Institute of Child Health and Human Development 
Contract NO1-HD-5-2910 



National Institutes of Health 
Biomedical Research Support Grant RR-05596 U 



National Science Foundation 
Grant BNS-81 11470 . 

National Institute of Neurological and Communicative 
Disorders and Stroke 
Grant NS 13870 
Grant N$ 13617 
Grant NS 18010 

Office of Naval Research 
Contract N000U-83-K-0083 



in 



■"SR-Sl (1985) 
(January-March) 



aHASKINS laboratories personnel in speech research 

Investigator*. 



Arthur S. A bra in son* 
Peter J. Alfonso* 
Thomas Baer 
Patrice S. Beddor+ 
Fredericks Beil-Berti* 
Catherine Best* . 
Geoffrey Bingham + 
G|pria >. Borden* 
Susan Braq>* 
Catherine P. Browman 
Franklin S. Cooper* 
Stephen Crajn* 
Robert Crowder* * 
Laurie B, Feldman*. H 
Anne Fowler* , 
Carol A. Fowler* 



Michael Anstett 
- Margo-Carter 
Philip Chagnon 
Alice Dadourian 
. Vincent Gulisano 



Christopher Allen 
Joy Arm son 
Dragana Barac 
Sara Basson 
Eric Bate son 
Suzanne Boyce 
Teresa Clifford 
Andre Cooper 
Jan Edwards 
Jo Estill 



Louis Goldstein* 
Vicki L. Hanson 
JKatherine S. Harris* 
Sarah Hawkins** 
Daniel Holenderl 
M Satoshi Horiguchi2, » 
Leonard Katz* 
J. A. Scott Kelso 
Gary Kidd* 
Andrea G. Levitt* 
Alvin M. Liberman* 
Isabelle Y. Liberman* 
Leigh Lisker* . 
Kristine MacKain* 
Virginia Mann* 
Ignatius G. Mattingly* 

Technical/Support 

Donald Hailey 
* Raymond C. Huey* 
Sabina D. Koroluk 
Bruce Martin 
Betty J. Myers 

► Students* 

'Nancy Fishbein 
Carole E. Gelfer 
Bruce Kay 
Noriko Kobayashi 
Rena A. Krakow 
Hwei-Bing Lin 
Katrina Lukatela 
Harriet Magen 
Sharon Manuel 
Richard McGowan 



Nancy S. McGarr* 1 
Kevin Munhall+++ 
Patrick W. Nye ' 
Lawrence J. Raphael* 
Bruno H. Repp 
Philip E. Rubin * 
Elliot Saltzman 
Donald Shankwetler* 
Mary Smith 

Michael Studdert-Kennedy* 
Betty Tuller* 
Michael T. Turvey* 
Ben C. Watson** 
Douglas H. Whalen . 



Nancy O'Brien 
Gail K>. Reynolds 
William P. Scully 
Richard S. Sharkany 
Edward R. Wiley 



Jerry McRoberts 
Susan Nittrouer 
Lawrence D. Rosenblum 
Richard C. Schmidt 
John Scholz 
Robin Seider 
Suzanne Smith 
Katyanee Svastikula 
Daniel Weiss 
David Williams 



♦Part-time. ^ 

lFree University of Brussels, Brussels, Belgium 
2 Visiting from University of Tokyo, Japan 
+NIH Research Fellow 
++NRSA Training Fellow 
♦♦♦Natural Sciences and Engineering Research Council of Canada Fellow 



V ft f 



/■ 

/ 

. 4 * 



Status Report on Speech Research 



Haskins Laboratories 



* 



SEGMENTATION OF CO ARTICULATED SPEECH IN PERCEPTION* 

j 

Carol A. Fowlert 



Abstract . The Research investigates how listeners 'segment, th£ 
acoustic speech signal into phonetic segments and explores implica- - 
tions • that the segmentation strategy. may have for their perception 
of the * (apparently) 'context-sensit ive Sllophones of a phoneme. Two 
manners of segmentation are contrasted.. In one, listeners segment 
the signal into temporally discrete, context-sensitive segments. In 
the other, which may be consistent with the talker's production of 
the segments, they partition the signal into separate, but overlap- 
ping,/ segments freed of their contextual Influences. Two 0 complemen- 
tary predictions* of* the second .hypothesis are tested. First, 
listeners will use anticipatory coarticulatory ' information for a 
segment as information for the forthcoming sdgment. ' Second, sub- 
4ects will not hear anticipatory coarticulatory information as part 
of the phonetic segment with which it co-occurs in time. The first 
hypothesis is supported by findings on a choice-reaction time^ proce- 
dure; the second is supported by findings on a 4IAX discrimination 
test. Implications of the findings for theories of speech produc- 
tion, perception and of the relation between the two are considered. , 

Skilled listeners sometimes behave as if they have have not extracted all 
of the phonetic structure from 'an acoustic speech signal. Listeners to fluent 
speech more readily . recognize target syllables and^ words than they recognize 
target phonetic segments (McNeill & Lind'ig, 1.973 ; 'Savin & Bever, 1970) and, 
in their perceptions of a fluently-produced sequence, they are likely to "re- 
store" phonetic segments that are overdetermined and missing (Samuel, 1981; 
Warren, 1970) or "mispronounced (Marsl en-Wilson & Welsh, 1978). 

Despite this apparent inattention to the phonetie structure of speech by 
skilled listeners, the structure persists in languages; that is, "duality of 
patterning" (Hockett, 1960) is not, apparently, disappearing from them. In- 
deed it is universal to languages, presumably 'because it is required to main- 
tain the openness of theilr open lexical classes. Moreover, , the phonetic 
structure of words _is psychologically real, even to the skilled listeners just 
described when they calk. For example, speech errors commonly consist of 
phonetic-segment misorderings ,and substitutions (cf. Fromkin, 1973), and many 



' Perception & Psychophysics , 1984/ 36, 359-368. 
tAlso Dartmouth College. 
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'language games (including rhyming, alliteration, and Pig Latin*, among others) 
involve operations performed on the phonetic- t (or phonological-) segmental 
structure of words (of. Pisori, in press). 1 . 

» 

If the, phonetic structure of words , is to be perpeti" ' in. languages, ( and 
if language learners are to become, talkers who Spoonf » play language 

games, the learners must be able to extract phonetic ' - are from an acous- 
tic speech si>gnal even' if they will not .always do so when they become skilled 
users of the language. This observation implies that „an acoustic speech sig- 
nal must provide sufficient information for extraction of the phonjtie struc- 
ture of the talker's intended message. Yet the signal has provided major bar- 
triers to investigators' efforts* to extract phonetic segmenta from it. 

Two related barriers are 7 those of segmentation' and Invariance. Both 
problems arise because speech id coarticulated— that is, because articulatory 
gestures for successive phonetifc segments are not temporally discrete. The 
segmentation problem is to understand how separate phone-sized segments may be 
extracted from a signal in /which . information for the segments . overlaps in 
time. The invariance problem is to rationalize listeners' classifications of 
phonetic tokens into types. It is called the "invariahce ,r problem because the 
presumption has been (e.g., Stevens & Blumstein, .1 981 ) that. its solution lies 
in discovering acoustic invariants that exist across tokensS^ The search for 
invariance i-s rendered difficult by coarticulation, which ensures that the 
acoustic -signal during a time window most closely identifiable with one 
phonetic segment- 1 is context-sensitive, not (wholly) invariant. Moreover, the" 
.ptableV of \explaining listeners' classifications goes beyond the search for 
acoustic invariance. Certain sets of phones (for example, the [d]s in [di], 
[da] and [du]) are- always classified as tokens of a common phonemic type even 
though acoustic information for the different tokens is largely (and, in some 
synthetic stimuli, entirely) context-sensitive, and even though listeners at- 
tend to the context-sensitive information (in the example, the second-f ormant 
transitions) more closely than to any invariant information (e.g., the shape 
of the release-burst spectrum; cf. Stevens & Blumstein, 1981) that may be 
present, when they identify the phones (Walley & Carrell, 1983). 

The present research contrasts two poss ible 'ways that listeners may seg- 
ment the acoustic speech signal into phone-sized segments. These strategies 
Tsiffer different perspectives on the problem of explaining listeners' classifi- 
cations of apparently context-sensitive phonetic segments into types. 

« 

Figure 1a displays an acoustic speech signal s'chematically, and Figures 
1b and 1c illustrate the two segmentation strategies. In Figure 1a, the 
horizontal axis is time and the vertical axis a provisional dimension, 
"prominence." The prominence of a segment in an acoustic signal refers to the 
extent to which acoustic properties characteristic of that segment are salient 
in thev signal. For example, in a syllable, /si/, /s/ is mTre promtnent than 
/i/ during the frication noise even though production of /i/ begins before or 
during closure for the frication (Carney & Moll, 1971), and evidenoe of . its 
production is available, in the signal. 

One segmentation strategy, illustrated in Figure 1b, uses the relative 
prominence of successive segments to establish boufidar ies, between them. This 
essentially is the procedure used in the phonetics literature when measure- 
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Figure 1. Schematic display of segment production. In Figure $ a, segments 
are produced in overlapping time frames . Figure 1 b t segmenta- 

tions are made at .points in time when one segment ceases to 
dominate in the signal and another takes over. Thi^ divides speech 
into discrete, context-sensit ive segments. In Figure 1c, speech is 
segmented along coart iculatory lines, into overlapping segments 
freed of- their contextual influences. 



menta are made of phonetic-segment durations, ^dP4ust ically realized (e.g., 
Klatt, 1975; Peterson & Lehisfee, .1960; and see Lisker, 197^, for other 
references). This procedure divides the acoustic speech signal into discrete 
context-sens it ive phone t ic segments. Disagreements concerning where to draw 
the segmentation lines arise when neither of two neighboring segments clearly 
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predominates in some* acoustic interval. For example, Lisker (1972) cites re- 
search* in which vowel onsets sometimes include and sometimes exclude formant 
transitions following consonant release. 

A second possible strategy is illustrated in 'Figure 1c. The acoustic 
signal is segmented along coarticulatory lines into overlapping phonetic seg- 
ments, free from the contextual influences of phonetic neighbors. 'Thus, for 
example, in /si/, the onset of tit is identified where production of til is 
first detectable within the 1st frication, not where its acoustic manifesta- 
tions begin to predominate in the signal. 

Measurement conventions reflecting the segmentation strategy of Figure 1b 
are adopted - in the phonetics literature to maximize reliability, not 
necessarily either to mimic listeners' segmentation strategies or to capture- 
any articulatory lines of segmentation that the signal may reflect. Indeed, 
the literature offers a hint that the conventions do not mirror the listener's 
manner of segmenting the signal. The* hint ^ 'is provided by two independently 
developed, but possibly converging, lines of 'research. 

First is evidence that listeners use anticipatory coarticulatory influ- 
ences on one phonetic segment as information for the influencing segment 
(e.g., Alfonso & Baer, 1983; Martin & Bunnell, 1982; Ochiai & Fujimura, 
1961; Whalen, 198H; but see Lehiste & Shockey, 1972). For example, Whalen 
(1981) cross-spliced friction noises across tokens of /sa/, /su/, /sa/, and' 
//su/ and asked listeners to identify the vowels of each syllable in a choice 
'reaction-time procedure. Listeners were faster and more accurate when> the 
frication noises provided accurate anticipatory information for the vowels 
than when they provided misleading information. 

4 

■ In itself this finding can be explained assuming the segmentation strate- 
gy of Figure 1b. Having partitioned the signal into discrete, context-sensi- 
tive allophones (WicKelgren, 1969,' 1976), listeners may use the con- 
text-sensitivity of the allophone that precedes a target segment to predict 
the target's identity. 

« \ 
The second line of research shows that listeners "compensate'" for 
coarticulatory influences on phonetic segments when they identify them.- ( Liber- 
man, Delattre, & Cooper, 1952; Mann, 1980; Mann & Repp,- 1980). For example, 
1st and tit are distinguished acoustically in part by the relative locations 
of energy concentrat ions in their spectra, that for 7s/ being higher than that 
for tit. In the context of a following tut,- however, the spectra for both 
consonants are lowered by anticipatory lip rounding. Compatibly, Usurers 
accept stimuli with concentrations of energy in lower frequencies as tokens of 
/st if the frication is- followed by a rounded vowel than if it is followed by 
an unrounded vowel (Mann- & Repp, 1980). The same frication noise may be iden- 
tified as it/- before-/i/, but as tat before tut. * 

Again, in itself, this finding can be explained assuming segmentation in- 
to discrete, context-sensitive segments. The explanation is that compensation 
reflects an adjustment to information for an earlier segment based on knowing 
the effects that anticipation of the following segment should have had on it 
(cf. Mann & Repp, 1980). 

* 

0 
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Considered ' together, however; this account and the foregoing account of 
listeners 1 use of anticipatory* coarticulatory information are paradoxical. 
For the segmentation hypothesis illustrated in Figure ,1b to account for the 
fin'dings. from the reaction-time procedure, it must be supposed that 'the 
coarticulatory effects on a segmeat are identifiable as guch before the onset 
of the anticipated segment. Otherwise, reaction times would not be reduced 
when coarticulatoryxinformation is "predictive. 11 (Nor , as Meltzer, Martin, 
Mills, Imhoff., & 2ohar, 1976, have shown, would they be improved even further 
when the anticipatory information is shifted earlier in time than its natural 
time of occurrence.) However, for it to explain a finding of compensation, it 
must be supposed that the segment following a context-sensitive allophone is • 
used to guide the identification of the .contextual influence on the allophone. 
That is, in the one instance, the coarticulatory information facilitates later 
identification of the segment it anticipates; in the other, it can only be 
identified as 'a coarticulatory influence on an allophone after the segment it 
anticipates has itself been identified. 

The alternative segmentation hypothesis under consideration satisfies' 
both sets of findings. Listeners may segment the speech stream along its 
coarticulatory lines into overlapping phonetic segments (Figure 1c). There 
are two consequences of this segmentation strategy: anticipatory coarticula- . 
tony information Is perceivgd as the onset of the segment it "anticipates" and' 
the same information therefore is not integrated with concurrent information 
for the preceding phonetic segment. That is, "compensation" occurs as p nec- 
essary by-product of segmentation. 2 From this perspective, compensation is 
symptomatic of an additional consequent of segmentation. Because sources of 
context-sensitivity are not integrated with information for segments with 
which they co-occur in time, the same phonetic segment in different coarticu- 
latory contexts is predicted to sound approximately the same to listeners. 
That is, listeners may perceive the tokens of a phonetic type as the same or 
very similar across different phonetic context? because sources of contextual ■ 
influence have not been integrated with the tokens. 

The present research is designed to contrast the foregoing accounts of 
segmentation. In particular, the research first asks^whether listeners will 
use coarticulatory information for a vowel within the acoustic domain of a 
preceding phonetic segment (in the present case, /g/) as information for the 
vowel (cf. Martin & Bunnell, 1982; Whalen, 1984). It next asks, if they do, 
whether they^ also show evidence of "compensation" for contextual influences of 
the vowel in their percept 1, il judgments of the preceding phonetic segment. If 
listeners exhibit both behaviors on ( the same syllables, then the segmentation 
strategy of Figure lb can be ruled out on grounds previously outlined: that 
strategy requires that listeners identify the coarticulatory information as 
such before identifying the segment it anticipates in the paradigm used by 
Martin and his colleagues; however, it requires the reverse ordering of 
identification to explain apparent compensation-. The segmentation strategy of 
Figure 1c provides a unified account of both findings. 

r 

Use of anticipatory coarticulatory information to identify a icrthcoming 
segment will be tested using the cross-splicing, choice reaction-time proce- 
dure developed by Martin and Bunnell (1982) and used by Whalen (1984). 
Compensation will be assessed using a 41AX discr imi nation procedure on the 
same stimuli (cf. Pisoni, 1971). 
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Listeners are said to compensate for contextual Influences if their 
perceptual judgments of a phonetic segment suggest that the contextual influ- 
ences have been eliminated or reduced (cf. Mann & Repp, 1980). A 4IAX trial 
such as the following will allow assessment of compensation: 

S t l g { u ~ gji g u u 

The trial includes four syllables temporally organized into pairs. Members of 
a pair have different vowels, but the yowels are the same across the pairs, 
"g" refers to a stop burst originally produced either in a- [gi] syllable 
(subscripted with "i") or in a [gu] syllable (subscripted witW "u"). Subjects 
are asked to' decide which pair has members that sound more ' similar.- If 
listeners make their assessment based on the relative acoustic overlap between 
members of a ; lr, they should select the members of the first pair as more 
similar than members of the second because the former have identical bursts. 
The opposite prediction is. made if listeners make their judgments with 
.contextual influences eliminated from the different consonantal segments (that 
i3, if they "compensate - " for those influences). In that case, .. influences of 
different vowels are eliminated ' from identical sto'p bursts in the first pair, 
yielding different residuals. In the second pair, the" different Influences of 
/i/ and /u/ are eliminated from contextually-appropriate stop bursts yielding, 
by hypothesis, identical, context-free phonetic tokens, Thus, members of the 
second pair should be judged more alike than members of the first. 

This research continues a series of studies reported elsewhere (Fowler, 
1983b; Fowler & Smith, in press). The earlier research used the paired 
choice reaction-time and 4IAX procedures just described to test listeners' 
perceptions of . coart iculatory influences of stressed vowels on preceding or 
following cross-spliced, unstressed schwa. Predictions based on the hypothe- 
sis that listeners segment speech along coarticulatory lines were partially 
supported for these stimuli. -We found positive evidence for the segmentation 
strategy "of . Figure ]c (and correspondingly, disconf inning evidence for the 
strategy' of Fi^gure 1b) when the coarticulatory effects under study combined 
both carryover arid anticipatory effects of stressed vowels (as in /ib bi/ and 
/ab ba/). The reaction-time procedure also provided positive evidence for 
contexts in which coarticulatory effects were only anticipatory (as in /b bi/ 
and /b ba/). However, in the 4IAX task, responses were random when coarticu- 
latory effects were anticipatory only. 

' We hypothesized that the chance performance in the 4IAX study in which 
only anticipatory coart iculation was present was due to a lack of sensitivity 
of the *4IAX procedure as compared to the choice reaction-time procedure, and 
not to a restriction on the applicability of the .segmentation hypothesis to 
carryover coarticulatory influences. This interpretation is plausible because 
anticipatory coarticulation of stressed vowels is more limited in these con- 
texts than carryover coarticulation (Bell-Berti & Harris, 1976; Fowler, 
1981a), and because, as compared to the choice reaction-time procedure, the 
discrimination procedure places severe memory demands on the. listener and 
requires a difficult judgment. However, it remains to be demonstrated that 
segmentation occurs along coarticulatory lines whether the lines reflect 
anticipatory or carryover coarticulation. 

The present experiment used 3timu-li in which anticipatory coarticulatory 
effects of a vowel oft a preceding segment are larger than they were on the 
schwas of ou previous study. SUimuli in the experiment are the stop-vowel 
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syllables /gi/ and /gu/. Beoauae the stop immediately precedes the .vowel and 
because velar stops coarciculate extensively with, vowels, I expected these 
stimuli to enable observation of segmentation of fcnticipatory coarticulatory 
influences of .the vowel from the acoustic domain of the consorfent if it oo- 
curs., « 

/ 

Experiment 

Methods. / 

— — 

Subjects . Subjects were 36 students at Dartmouth College. All were na- 
tive shakers of English" and reported normal hearing. 

Materials 

Stimuli . Stimuli were two tokens each of the monosyllables /gi/ and /gu/ 

. produced by a female talker. They were input' ,to a New England Digital 

minicomputer, sampled at 20 KHz and filtered at 10KHz. 

Based on criteria provided by Dorman, Studdert -Kennedy, and Raphael 
(1977) the release bursts of each utterance were identified and segmented from 
the' remainder of the syllable. Release bursts ranged in duration from 16 to 
20 ms and did not vary systematically in duration with the identity of the 
vowel. (These values compare to averages across /gid/ and /gud/ of 7.5 ms for 
one speaker reported by Dorman et al. „d 22.5 for \the second speaker.) The 
-period of aspiration following release 'is removed from the vocalic portion of 
the syllable and was replaced by an'.-, '.valent period of ^silence. This was 
done to avoid abrupt discontinuities ,x. the spectra when , bursts and vocalic 
segments were cross-spliced. In the test. orders, stimuli were presented in 
low levels o/ white noise, which improved the perceived \quality of the stimuli 
by masking the temporal discontinuity. The intervals^ of aspiration ranged 
from 6 to 15 ms (as compared to averaged values of 11.5 and 12.5 for talkers 
in Dorman et al.). Durations of the^voiced portion of each syllable were 129 
and 130 for tokens of /gi/ and 359 and 361 ,for tokens of /gu/. 

Three types of test syllables were constructed • from the syllable frag- 
ments just described. The four "original" syllables consisted of release 
bursts and vocalic portions that had originally been produced together. They 
were separated by a period of silence equivalent to the original period of 
aspiration .for the vocalic segment. "Spliced" syllables were release bursts 
from one token of a syllable type attached to a silent interval and vocalic 
portion originally associated with production of the othervtoken of the same 
syllable type. (That is, for example, a burst from one token of /gi/ was 
spliced onto an interval of silence and a silent interval and vocalic portion 
>f the other token of /gi/.) There were four spliced syllables. Ifight 
"cross-spliced" syllables were created by attaching a release burst fronl a to- 
ken of one type onto a silent interval and vocalic portion associated with a 
token- of ' the other phonemic type. (That is, for example, a burst from a token 
of /gi/ was spliced onto the voqalic portion of a /gu/ syllable.) 

Identification test . An identification test presented release bursts, 
vocalic portions and whole CVs for identification in that order in separate 
blocks of 32 trials. v 
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\ ■ 
"The identification test 'was originally presented to 12 naive subjects, 
who had not heard the stimuli before, and to 12 subjects who had just pomplet- 
ed the choice reaction-time and MIAX tests to be described. These 2*.' subjects 
were given an answer sheet With alternatives' "bee," "dee," "gee," "boo," 
"doo,"and "goo" arranged in three blocks of 32 rows, pne row for each tria^ 
of the identification test. In the test, both groups of subjects exceeded 
chance in their ability* to identify the -syllables' vowels f ran their release 
bursts alone. This suggested the possibility that some or all Of the subjects 
heard diphthongal vowels in the cross-spliced syllables. To^assess- that, a 
new group of 12 subjects took. the Identification test preceded by the reac- 
tion-time and WAX procedures. The response sheec given to these subjects for 
the identification test allowed- six new response alternatives: "bwee," 
"dwee," "gwee," "byoo," "dyoo," and "gyoo" in addition to the original si,x. 

Choice react ion- time^test . The choios^afiUon time study, modeled after 
the paradigm of Uh61erT r l\9&>*) , consist**^ of original, spliced and 
cross-spliced stimuli randomly presented one at a time in four blocks of 18 
trials. Predictions were that, because. the release burst would provide misl- 
eading information for the vowel in cross-spliced stimuli, reaction time and 
accuracy to identify the vowel in those stimuli would be inferior to the same 
measures taken /on original and spliced stimuli.' 

HIAX discrimination test . The 4IAX test consisted of three blocks of 64 
trialsT" One half of the trials were of type A and one half were of type B, 
both illustrated by example fcelow. Stimuli in this test were either spliced 
or cross-spliced; no original stimuli were presented. (As before, subscripts 
on the [g]s Indicate the vowel with which the release burst had originally 
been produced*), 

Trials of type A were designed to test whether listeners could distin- 
guirh the different bursts in the context of a vowel. As described in the 
introduction, trials of type B provided the critical test of the. segmentation 
hypotheses. 

A: g i i ~8i i g i i ~" g u i 

B: gji— g A u g i i_ ~ g u u 

In either trial type, four stimuli - were presented per trial, arranged 
temporally in two pairs. Members of one pair of an A trial were identical. 
One member of the second pair was identical to the members of the first pair. 
The fourth syllable differed from the others in its release burst. That syll- 
able was always cross-spliced. Trials/ of type B were like trials of type A 
except that the vocalic segments within a pair were different. In B trials, 
then, "She members of one pair had identical release bursts In the other 
oair, one-item had the, same release ' burst as the members of the first-men- 
tioned' pair; the other had a different release burst. In pairs where bursts 
were identical, one member of the pair was spliced and one cross-spliced. For 
the pair with different release bursts, both members were spliced so that the 
bursts were in vocalic contexts compatible with those in which they had origi- 
nally been produced. 
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In a trial, the offse^onset time/, was 200 ms within i pair; between 
pairs it was 500 ms, IT the stimuli In^the sample trial's. above are labeled I, 
2, 3 and 4, then their ordering in the- sample trials is 12-3ft. In addition ' o 
this ordering were equal numbers of occurrences of orders 21 ^3i&3^ 2 ai.d 
43-21 . In the sample trials above, release burst Ig^] occurs threes times 
and burst [g ] occurs just once. There were equal numbers of trials' in 
which Cg u 3 was the more frequent burst in the trial. 

Procedure - 

■ • s 
> s • 

Group J_. ' The"^ twelve subjects in this group took only the identification 
test, , They listened tq stimuli over headphones; Stimuli were presented 
on-line on "a New England ^Digital minicomputer. In' this test as in the others 
the stimuli were mixed with a low level of white noise. 

Subjects were told that stimuli <on the first third of the test were the 
first few stnillisecohds o."* a CV syllable and their task was tp guess the iden- 
tity of the whole syllable from the f fragment. The syllable types they might 
hear were pronounced for them. They were instructed to circle the response 
choice on the answer sheet that best represented the syllable from which the 
fragment had been excised. They were required to guess if necessary. In 
addition, they were told that tney might hear all or only some of the syll- 
ables represented on the answer sheet. Therefore, they should circle their 
best guess based on what they heard and not attempt to distribute their re- 
sponses evenly among the response alternatives. On the second block, triey 
were told that the' stimuli were the remainders of the CV syllables with the 
first few milliseconds excised. Instructions were the same as on the first 
block. Finally, on the third block, they were told that stimuli were the two 
types of syllable fragments they had just been listening to, but rejoined to 
make a whole CV syllable. Instructions were to identify the CV on each trial 
as one of the six listed on the answer sheet. 

Trials were initiated individually by key press; therefore, subjects had 
unlimited time to make their responses. 

. Groups 2 and 3. The twenty-four subjects in these groups took the choice 
reaction-time test" the HIAX test and the identification test in -that order. 
The procedures for these groups were identical; they differed. only in the re- 
sponse sheets they received on the identification test. 

In the reaction time procedure, subjects listened over nladphones to 
stimuli presented on-line and mixed with, noise as in the identification proce- 
dure just described. They were instructed to identify the vowel in each syll- 
able as M ee w or "oo" by hitting the appropriate labeled key on the computer 
terminal's keyboard as^quickly and as accurately as possible. They received 
response-time feedback after every trial and averaged response-times and accu- 
racy at the end of each of the four blocks of trials. They were asked to 
keep their accuracy above .9*. The rirst block of trials served as practice. 

In* the HIAX procedure, subjects were instructed to choose the first or 
second pair of stimuli on each trial as having the qjore similar members. They 
signaled their selection by typing "1" or "2" into the computer, using the 
calculator pad on the, keyboard. They followed that selection with a 
confidence Judgment (1: guess, 2: intermediate certainty, 3: high level of 
confidence). Neither response was timed. 
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In this test there were three blocks of 64 trials, the first .block serv- 
ing as practice. Trials were self-faced and there was no feedback. 

Last in the session, subjepts took the identification test. 

: ■ w * . . ' 

. Results 

Identification . Identifications of bursts, vocalic portions and origi- 
nal, spliced -and cross-spliced CVs are provided in Table 1 for all three 
groups of subject*. Consonant' and vowel identifications are displayed 
separately. , 

.'*-'• ' ''' * 

In identification of" consonants from isolated bursts, subjects are close 

to chance in Group 1 (naive listeners). Performance for experienced subjects, 
particularly those in Group 3' is better than chance. More remarkable is per- 
formance identifying the vovtI from the burst. All groups exceeded chance on 
this identif fcation^ task. 1 ■ 

i" ' 

» As for' the. isolated vocalic portions of the syllables, the vocalic por- 
tioi#of.[gu] led to predominantly "b" identifications in all groups. In con- 
trast, the vocalic portion of [gi] evidently was more ambiguous leading to 
substantial numbers of identifications in all consonantal response categories. 
Vowel identifications based on vocalic portions of the syllables were, 
accurate.* ' 

r Subjects in all groups were accurate in identifying the vowels and conso- 
nants of original and spliced, whole syllables. As for cross-spliced syll- 
ables, M g" was the predominant identification in Cg u i], but, in two groups, 
"b" was the predominant consonant identification for- Cg^]-- a finding also 
reported by Cole and Scott (1974; for a related finding, see Liberman, Delat- 
tre, & Cooper, 1952). Subjects in Group 2 gave preGominantly "d" responses 
for the consonant in the latter syllable. 

x Subjects in Group 3 did report more diphthongs in the cross-spliced' syll- 
ables than elsewhere, particularly in the. syllable ' Cg u ij . Ttyese <j(ata, as 
well as those for subjects reporting "b"s or "d"s in cross-spUced syllables 
will be used later to examine individual subject's performances in the choice 
reaction time and 4IAX procedures. * 

Overall, this test provides two pieces of information necessary to the 
interpretation of the next two testa. First,, despite the surgery performed on 
*the syllables, the bursts are integrated with the vocalic portions sufficient- 
ly in whole CVs that consonant identifications based on the two fragments 
together are different from identifications based on the separated parts. 
Second, the identification test provided a finding that we had expected to un- 
cover only irt the reaction-time procedure— namely , that listeners pre sensi- 
tive to the information for the following vowel in the release burst; This 
led to the only effect thai the burst appeared to have on vowel identification 
in the identification test. Some subjects in Group j identified }he vowel as 
diphthongal. 
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. . Table 1 ' 

• Identifications (Proportion* of Responses o ) of, Bursts, Vocalic portions and 
Whole Syllables .by Naive (Group 1) and Experienced. Listeners (Croup 2) with 6 
Response Choices and by Experienced Listeners (Group 3) with 12 Alternatives 
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Choice Reaction Time 



Table 2 provides' response times and accuracies in the choice reaction 
time procedure. In Table 2a, means are collapsed, over the subjects in Groups 
2 and 3. Although subjects in Group 3> responded more rapidly (by an average 
of 70 ms).and more accurately (by an average of 5%) response patterns and out- 
comes of separate'ANOVAs performed on the data f,rotp each group were the same. 



Table 2 



Reaction Times (in- ms) and Accuracy (Percent Correct) for Groups 2 and 3 in 
the Choice Reaction Time Study ' . . 

\ 



Hginal 
iced 
ss"spliced 



RT 


Accuracy 


RT 


Accuracy 


455 


93 




4* 98 


453 


93 


425 


98 


50^4 


73 


520 


86 



Reaction times and accuracy were subjected to separate two-way analyses 
of variance with factors: Syllable-type (original, spliced, cross-spliced) 
and vowel (/i/, /u/). In the analysis of reaction times, the main effect of 
syllable type was significant, £(2,46) - 29.40, £ < .001, reflecting the 
substantially longer response times to cross-spliced as dompared to spliced 
and original syllables. In addition,* the interaction of vowel and syllable 
type reached significance, F(2,46) - 4.05, £ - .02, because the slowing caused 
by crcs3-splioing was more marked for the syllable Cg u i] than for [gju]. 

The accuracy measure provided a compatible outcome, with performance low- 
est in cross-spliced as compared to original and spliced syllables, F(2,46) =» 
20.87, £ < .001. In this analysis, the interaction did not reach signifi- 
cance". 

The identification test had revealed that some subjects heard diphthongal 
vowels in cross-spliced syllables, particularly in Cg u i]. This provides an 
alternative account of the slowing on cross-spliced stimuli. If subjects hear 
diphthongs, then,* as predicted, they hear the vowel information in the burst; 
their reaction times to cross-spliced st'imuli are slowed, however, because the 
perceived vowels include both response alternatives and subjects have to 
choose just one. Subjects who do not report diphthongs may also extract vowel 
information from the bursts in cross-spliced syllables, yet still hear the 
syllable vowel as monophthongal because later vocalic information overwhelm- 
ingly contradicts information in the burst. This latter was the possibility 
the experiment had been designed to establish and test. 

0 

/ 
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# 

Post-hoc analyses of responses by individual subjects in Group- 3 *were 
performed to determine whether subjects responded differently depending on 
whether frhey heard the v.owel' as monophthongal or diphthongal. For the syll- 
able Lg u i], seven, subjects consistently reported diphthongs in the identifi- 
cation test," three consistently reported monophthongs and two reported some of 
each. (Consistency in identification was defined operationally as selection 
of a diphthongal [monophthongal] response ,on at' least 6 of 8 opportunities on* 
the identification' te3 .) For the syllable Cg lU ], numbers of subjects fal- 
ling into the three \categories were two, ten, and zero, respectively. Some 
subjects fell into the same category twice, because they heard the vowel in 
the same way on botti\ syllables. In those instances, their data for the two 
syllables was pooled. \Average response times and accuracy were collapsed over 
syllables for the seven subjects' consistently reporting diphthongs in' Group 3 
and separately for the ten subjects reporting monophthongs. An analysis of 
variance comparing the , two groups on the original, spliced, and cross-spliced 
stimuli yielded a highly significant effect of splicing condition', F(2,30) - 
39.09, 2 < ,001 » Dut no effect of subject group and no interaction"^ both Fs 
less than one)-. . It seems thac whether or not subjects experience the- antici- 
patory vowel informati.on in' the burst as a glided it serves them as informa- 
tion for a vowel and, in cross-spliced stimuli, subjects are jni sled by i\. 

jllAX ' ■ ' \ .. 
. \ 

Table 3 provides the outcome of the 4IAX test collapsed over subjects in 
Groups 2 and 3. The data were collapsed over the groups because analyses per- 
formed on the individual groups did not differ. » 

— _ 

TablS 3* 

Outcome of the 4IAX Test Collapsed Over Subjects in Groups 2 and 3 

* 

Response Selection ** 

» 

Cross-gpliced Syllable * Acoustically Acoustically 

Trials 



A 
B 



V 


gjU 


Identical 


.Different 




.82 


2.48 


1*91 


.3^ 


.27 \ 


1.86 


2.25 



"Proportion of A and 8 trials In which listeners selected syllablea having 
acoustically identical bursts as more similar than syllables having acousti- 
cally different bursts. ""Confidence Judgments. 



As predicted, on A trials, listeners reliably chose syllables having 
acoustically identical bursts in their proper contexts as more similar than 
syllables having acoustically different bursts in identical contexts (/gu/: 
t(23) - 18.34. £ < .001; /gi/: t(23) - 11.84, £ < .001). This verifies that 
the anticipatory coarticulatory information is audible in the context of a 
syllable. 
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V 

Of greater interest is performance on B trials. On these trials, 
listeners compared syllables with acoustically different bursts, each in their 
oroper coarticulatory contexts (e.g., CgjiJ-Cg^]) to syllables with 
acoustically identical bursts, one . in its original corttext and one not (e.g., 
[g,i]-[g.u]). As predicted, on these trials im contrast ' to A trials, 
listener! reliably selected the syllables with different bursts as more simi- 
lar than those with identical bursts (/gu/: t(2i3 * -3.26, £ - .004; /gi/: 
,£(23) - -3.96, £ < .001). 

As shown in Table 3, confidence judgments mirror the response selections. 
The confidence Judgments in Table 3 are collapsed, over syllable type. This 
was necessary because subjects occasl6nally had no responses either in the 
"acoustically different bursty category on A trials- or in the "acoustically 
identical bursts" category on B trials. No subject had missing data when the 
data were collapsed over [g,] and [g ] trials. On A trials, subjects are 
'more confident of their (correct) judgments that syllables with identical 
bursts are more similar than those with different bursts. On B trials, their 
confidence reverses. A two-way analysis of variance (trial type [A,B] by 
judgment [syllables with acoustically identical bursts, those with different 
bursts]) was performed on the confidence judgments. In that analysis, the ef- 
fect of trial type, F(1,23) - 5.02, £ - .03, and the interaction, F ( 1 ,23 ) - 
39.33, £ < .001, were significant. The significant interaction reflected the 
effect of interest. Listeners were more confident of their correct selections 
of syllables having acoustically identical bursts on A trials than of their 
errors, F(3, 23) •- 9.31, £ < .001; on B trials, they "were less confident of 
their selection of those having acourtically identical bursts than of their 
selection of different bursts in -their proper contexts, F(3,23) - 4.28, £ - 
.02. 

Response selection by individuals hearing diphthongs was examined 
separately from individuals hearing monophthongs. , The average performance on 
B trials of the eight subjects reliably hearing diphthongs did hot differ from 
that of the ten subjects hearing monophthongs, t(l6) - 1.07, £ ■ .30. 

It is 'also of interest to look separately at subjects for whom cross 
splicing changed the identity of the consonant to /b/ or /d/ and those for 
whom it did not. For subjects of the first type, the 4IAX task confronts them 
with an easy between-category discrimination. For subjects in the second cat- 
egory, the task is one of within-category discrimination. 

For these analyses, data from Groups 2 and 3 were pooled. In all, there 
were 15 subjects who reported the syllables with the cross-spliced burst reli- 
ably -as /b/- or /d/-initial in at least one syllable. All but two of these 
were subjects in the condition with cross spliced Cgj], Across Groups 2 and 
3 i:here were 19 subject\ reliably reporting "g" on at least one syllable. 
Performance differences were significant between these two groups, as'expected 
from the general findings that between-category discr iminatiori is easier than 
within-category discrimination, £(32) = 2.92, £ < .01. However, subjects 
hearing /b/ or /d/ were not wholly responsible for> the outcome on B trials. 
Of those 15 subjects, 1\J had performance levels below .5, t(14) - -5.76, £< 
.001. of the 19 subjects hearing /g/, 12 showed the predicted direction of 
difference, \Jt( 18) - ~y02, £ - .056. We conclude, then, that although the 
within-category discrimination is much more difficult than the between-cate- 
gory discrimination, it is not qualitatively different from between-category 
discrimination. Overall, subjects hear syllables with acoustically different 
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bursts in their ^proper coarticulatory contexts as more similar than those with 
acoustically identical* bursts, one in its proper context and one not; making 
the discrimination at all is facilitated if the segmentation process leads the 
cross-spliced burst to fall into a different phonemic category than its origi- 
nal one. / 

b 

Discussion , 

In this study, as in the earlier j?ssear,ch reported by Fowler and Smith 
(in press), subjects 1 choice reaction-time '-and discrimination performances re- 
flect the segmentation strategy of Figuse 1c more closely than that oi Figure 
"lb,* Listeners use coarticulatpry information as information for the influenc- 
ing segment, and they do not integrate it into their perceptual experience of 
the segment with which it co-occurs in time. The present study extends the 
findings of Fowler and Smi.th to "anticipatory coarticulatory influenges and to 
coarticulatory relationships, of consonants and vowels. 

The segmentation of speech that our research suppbrts closely resembles 
that achieved by a jjjecent computer model of speech perception described by El- 
man and McClelland 01983). In their moded (cf. McClelland & Rumelhart, 1981), 
features, phonemes and words arp represented by "nodes" interconnected by, 
excitatory and inhibitory links* In general, excitatory connections link 
nodes that are mutually consistent; inhibitory connections link nodes that 
are inconsistent, (For example, phoneme nodes excite words of which they are 
constituents; word nodes inhibit each other.) Acoustic information input to 
the model activates features compatible with it; in turn, the features 
activate phonemes consilient with them, and phonemes activate words. Of 
particular interest here is the segmentation of the acoustic signal c that the 
model achieves over time as it identifies phonetic segments from an Acoustic 
speech signal. Over time, the acoustic signal first provides stronger and 
then weaker evidence for the presence of a particular phonetic segment. I 
have called that waxing and waning of information the "prominence" pattern for 
a segment. In the model of Elman and McClelland, • the activation pattern for a 
phonetic segment tracks the waxing and waning of^ information for the segment 
in the acoustic signal. 

Due to coart iculation, in most time frames, the model receives featural 
information consistent with two phonetic segments concurrently — for example, a 
syllable-initial consonant and a following vowel. When that happens, two pho- 
nemes are highly activated concurrently. Eventually information for the first 
segment dies out leaving the highly activated second segment; The activation 
patterns for a sequence of phonemes, therefore, resemble the prominence curves 
represented in Figure 1a. Thus, in the model, although there is no explicit 
segmentation process separate from the process of identifying phonetic seg- 
ments, nonetheless, a segmentation of the signal is achieved, and H is 
precisely the segmentation that I have found characteristic of human 
listeners. In the present study, listeners begin using acoustic information 
for a segment as such whenever it occurs in the* speech signal. This leads to 
a reaction-time advantage" for original and splioed over cross-spliced stimuli 
in the (choice neaction-time study; If the information is' coarticulatory, they 
do not integrate it with information for a segment with whidi it co-occurs in 
time. This leads to the findings in the 4IAX study. 1 * 
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The ^model^of "Elman and McClelland would not achieve the segmentation it 
does if the acoustic signal did not support it! It has not been obvious that 
the signal does support thie segmentation,- however, because visible displays 
of the signal do not invfte it; indeed, acoustic analysis guided by visible 
displays have not achieved it. (This is true not only of segmentations used 
in the phonetics literature as described "in the introduction; it also appears 
to characterize? segmentations 'described by naive subjects learning- to read 
spectrograms based on. a whole-word training procedure [Greene, Pisoni, & Car- 
rell, 1984].) It will tfe important for future research to make explicit the 
relationship betweeen the listeners 1 and the model's segmentation of the 
acoustic speech signal on the one hand and the support for it that the signal 
provides on the other. . . ^ 

One step further back in the chain of communication, the acoustic speech 
signal could not reliably give rise to the segmentation it does without sup- 
port from the talkers' articulations. That is, the gestures correspondin 
a given phonetic segment must, in some sense, cohere in articulation and those 
corresponding 'to different segment? must, in the same sense, be separable. 5 
This line of reasoning in turn suggests that Hockett's (1955) often-cited Eas- 
'ter-egg analogy is misleading. Hockett compared the effects of coarticulation 
on phonetic segments to a process of sending a row of Easter eggs through a 
wringer. His analogy reflected the view, still current (cf. MacNeilage & 
Ladefoged, 1976), that coarticulation destroys both the coherence of individu- 
al phonetic segments and their separation one from the other. Necessarily, 
then, the acoustic signal cannot be supposed to provide sufficient informa- 
tion, in itself, to support perception of the segments; rather, phonetic 
identifications must be interpretations imposed on the signal by a listener 
(cf. Studdert-Ketonedy, in press). * 

The present findings and the behavior of Elman and McClelland' s computer 
model render this perspective on articulation doubtful, however. In view of 
that, it is not surprising that research on articulation suggests a picture 
tidier than Hockett's analogy implies. For example, research by Ohman (1966), 
Carney and Moll (1971), Butcher and Weiher (1975), and Barry and Kuenzel 
(1976) agree in showing that vowel-to-vowel movements of the tongue-body occur 
'before, throughout and af k er the production of an intervocalic consonart in a 
VCV production. Ohman's interpretation is that, in VCVs, consonantal gestures 
are superimposed on on-going diphtnonpal vowel-to-vowel gestures. In this 
type of utterance, s then, coarticulation does not destroy the coherence of fea- 
tures of individual phonetic segments or the separation among distinct seg- 
ments as the Easter-egg analogy implies. Indeed, rather than being an irre- 
coverable smearing of consonantal and vocalic gestures, in these utterances 
coarticulation "is the overlapping occurrence of two distinct types of ges 
tures— one for the vowel-to-vowel movements and ori^ for the consonantal ges- 
tures. 

This research on C-V coarticulation converges with other production re- 
search, in which segment durations are measured. In that literature, vowels 
are measured to shorlen as consonants are added to a syllable, and in similar 
fashion stressed voSels are measured to shorten ss unstressed vowels are ad- 
ded to a word or stress, foot (e.g., Fowler, 1977, 1981 b; Lindtflfom & Rapp, 
1973) Data in Fowler (1981b) suggest, however, that at least some of the 
measured shortening is not articulatory shortening in fact, but rather re- 
flects the ;jort of articulatory overlap reported by Ohman and others (and 
illustrated in Figure 1). It is identified as shortening only because meas- 
urement conventions do not include, as part of a vowel's duration, those parts 
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of, t^a coarticulator?y extent where another segment predominates in the signal/ 
Together w>fe+r^the articulatory measures, the shortening pleasures further sup- 
port the hypothesis that consonants and vowels (and stressed and unstressed 
vowels) are* nondestructively overlapped in production in a way consistent with 
the perceptual segmentation of the* acoustic signal that the present research, 
•and that of Fowler and Smith (in press) suggest. 

The produ^ion research just descried and our interpretation of the pre- 
sent findings both predict that the perceived duration of a phonetic segment 
should exceed its measured duration, corresponding instead approximately to 
its coart iculatory extent. A similar expectation can be derived from Liberman 
and Studdert-Kennedy 1 s discussion (1978) of reasons why coarticulation may be 
necessary for perceivers. 

In their view, talkers have to produce speech that meets two competing 
requirements. Because meanings of grammatical utterances have to be extracted 
from grammatically-coherent groups of words, and cannot be determined; 
word-by-word, speech may have to be transmitted at grapid rate. Thf listener/ 
has to be able to remember the beginning of a syntactTc phrase at the time the 
end ■ of it is produced. Second, however, the rate cannot exceed that at which 
listeners are no longer able to determine the order of sequences of sounds 
(Warren, .1976). Liberman and Studdert-Kennedy point out that coarticulation 
allows relatively long-duration segments to occupy relatively short intervals 
of time. However, this w^jld only be a perceptual advantage to a listener who 
heard .coarticulatory overlap a3 overlap rather than as context-sensitivity of 
discrete phonetic segments. In recent work, I have found some evidence that 
the perceived' duration of a vowel does indeed exceed its measured duration 
(Fowler, 1983a). # 

Together, the research and theoretical considerations outlined here sug- 
gest a coherent perspective on the production and perception of speech 
(cf. Fowler, 1983a, 1983c). Talkers produce Dhonetic segments . in over lapping 
time frames. The articulatory overlap, however, does not smear the segments; 
rather it oreserves the coherence of the temporally extended parts of an 
individual phonetic segment and the separation of distinct segments. Compati- 
bly, the acoustic signal provides information for the separation of overlap- 
ping segments and the coherence of temporally-extended parts of a segment. 
Finally^ listeners segment the signal realistically, recovering the segments 
that talkers produce. 

"V 
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Footnotes 



'Pisoni (in press) marshalls evidence from a variety of soun^es (from 
linguistics: synchronic and diachronic phonological regularities, systematic 
"alternations among morphological relatives in the lexicons of languages; from 
psychology: language games, alphabetic writing systems and speech errors) 
converging on conclusions that phonetic structure is psychologically real and 
that it plays an important role in language use. 

2 We will use the word "compensation" to label findings that contextual 
influences are not integrated with in/ormation for a segment with which the^y 
co-occur in time. We do not intend to imply any active process of compensation 
by perceivers, however. Our hypothesis (see also, fclman & McClelland, 1 983 4 
is that compensation is a by-product of segmentation, which itself is a neces- 
sary consequence of phone identification. C 

> 

J Subjectc in .Group 3 performed better than those in Group 2 on all three 
tests. Subjects in tHB»two groups were recruited from the same type of popu- 
lation (an Introductory Psychology class) and received the same instructions. 
The relevant difference between them, I think, is that subjects in Group 2 
were reeruited at tne end of one academic term and -those in Group 3 at the 
beginning of the next one. Individuals who look for extra credit at tennis 
onset may be more highly motivated overall than individuals who seek it at 
term's end. 

"There is a possible difference in the view of segmentation depicted in, 
Figure 1c and that achieved by the model of Elman and McClelland. In the mod- 
el segmentation is a by-product of procedures for identifying the component 
phonetic segments of an utterance. The strategy of Figure 1c, could be, but 
is not necessarily or explicitly a strategy for identifying the phones; it is 
essentially a strategy for keeping separate the information for different 
phones that is provided in overlapping time windows. The findings of the pre- 
sent study appear to be fully compatible with the model of Elman and McClel- 
land, however. 
<> 

'Research by Abbs and Gracco (in press) and by Kelso, Tuller, V.-Bateson, . 
and ^owler (1984) provides Dreliminary evidence for this. In the research of 
Kelso et al, a subject'.s jaw is perturbed as it closes for the final consonant ~ 
of /b;eb/ or /baez/. If the utterance is /baeb/, compensatory movement by the 
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upper lip achieves the lip closure necessary for the bilabial segment. No 
reactive activity is found in the tongue, which is not involved in /b/ produc- 
tion* Thus, two articulators involved in the production of an individual seg- 
ment are found to.be coupled; an articulator involved in production of -other 
segments is not. A different outcome is observed when the final consonant is 
alveolar /z/. Thejre, the tongue does compensate for Jaw braking during clo- 
sure; although excitatory lip activity is observed .in this case, no lip move- 
ments occur. Again, two articulators involved in the production of a single 
segment are functionally coupled in articulation. They are not coupled to 
articulators required to produce different segments. 
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INTRINSIC TIME IN SPEECH PRODUCTION: THEORY, METHODOLOGY, AND PRELIMINARY 
OBSERVATIONS* ' 

«••*.. 
J. A. S. Kelsot and Betty Tullertt 



Abstract . A continuing challenge to our understanding of speech 
production and perception is the fact that utterances with markedly 
different acoustic, kinematic, and electromyographic characteristics 
can nevertheless be perceived as the "same" word. In this paper, we 
discuss the importance of examining articulation relative to an in- 
trinsic, activity-defined metric and show how, , such an analysis of 
intervocalic consonant timing across, different speaking rates and 
stress patterns significantly reduces both interspeaker and 
intraspeaker variability. Next we explore whe-ther the observed rel- 
ative temporal stability can be aohieved without reference to an ex- 
trinsic clocking device, but rather in terms of the dynamio topology 
of the system's behavior. To this end, using a phase plane descrip- 
tion of articulatory motion, we show how the temporal analysis 
originally offered can be redescribed" in terms of critical posi- 
tion-velocity states (or, in polar coordinates, phase angles) for 
interarticulator cooperation. Such coordination, we propose, can be 
captured in terms of events that are Intrinsic to the system's 
dynamics, not in terms of conventional durational metrics. 



^-U.Q Introduction 



X 



One primary focus of 
how the many articulatory 



speech production research has been to understand 
degrees of freedom are temporally organized. In 
traditional (and many current) ^.theories, the problem is "explained" by 
invoking the notion of a programj a representation of the behavior coded in 
some mental or neural device that exists before the behavior is realized in 
the real wprld (see Kelso, 1981; Kelso, Tuller, & Harris, 1983; Kugler, Kelso, 
& Turvey, -1980, for criticisms). The function of the temporal program is to 
instruct the articulators when to become active and for how long. For 
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example, in the early, influential theories proposed by Kozhevnikov and 
Chistovich (1966) arid Lindblom (1963), execution of discrete linguistic units 
was thought to 'be triggered by an independent "rhythm generator" or "timing 
program,"* which timed the units with respect to one another. More recently, 
so-called central pattern generators have been thought to be the neural 
embodiment of timing programs. 

This class of theory can be termed indicational (cf. Reedf 1981). That 
is, the role of the plan is to indicate, instruct, or command the articulators 
how and when they should be active. The emphasis of indicational theories is 
placed firmly on the symbolic mode of description with little or no attention 
paid to the detaile'd dynamical processes that the symbol ode is said to indi- 
cate or direct. To use a favorite example (cf. Pattee, 1y77). a stop sign in- 
dicates to a driver that the car shpuld be stopped, but provides no detailed 
information about how to stop the car, i.e., how, where, and by how muoh to 
decelerate,- apply the brakes, ,etc. Thus, the symbolic or indicational mode 
greatly " underdetermines the information actually required to perform ■ an 
activity. In the case of speech, indicational theories pay no regard to the 
dynamical behavior of the articulatory system, i.e., the ordered motions of 
the articulators in space^and 'time. The timing program or rhythm generator 
concept emphasizes the . symbolic, indicational mode and provides no account of 
how the multiple degrees of freedom of the articulatory system are actually 
coordinated in the course of an activity. 

Indicational theories (which are pervasive in biology and psychology) not 
only ignore, in large part, dynamical processes but also lack a rationale for 
how it' is and by what means one particular symbol string is created rather 
than another (cf. Kugler, Kelso, & Turvey, 1980; 1982; Turvey & Kugler, 1984). 
In speech research, different variants of an indicational theory include fea- 
tures, segments, or syllables as candidates for units of articulation. What 
is missing,' then, is an account of the symbolic mode that is not arbitrary 
with respect to the dynamics that it instructs. The origins of the symbolic 
mode must, it seems, be lawfully derived from dynamites. 1 < 

The foregoing arguments serve to focus attention on dynamics as a source' 
of understanding natural activities, such as speaking, whose spatiotemporal 
organization is the main concern of this paper. Dynamics, by definition, 
means simply motion and change in space/time. Maxwell (1877) described 
dynamics as the "simplest and most abstract description of the motion of a 
system." Thus, and this is important, there is no logical reason why 
dynamics, although rate-dependent, cannot be conceived of as, abstract. One 
reason why dynamics has been undervalued is that it has been interpreted as 
local and concrete (purely biomechanics?) rather than global and abstract. 
Yet recent developments in the field of dynamical systems indicate otherwise: 
Systems possessing huge numbers of dimensions can be abstractly characterized 
in a low-dimensional space (cf. Abraham fc Shaw, '1982; Haken, 1983). Mor ver, 
it is certainly possible, in principle, 'to characterize the global behavior of 
a dynamical system using symbolic representations (e.g., Crutchfield & 
Packard's, 1983, "symbolic dynamics"). Here again, however, the question of 
the non-arbitrary or privileged coupling between a symbolic representation of 
the aynamics and the dynamics itself remains. 

In this paper, we shall try to accomplish three goals, all of which re- 
late to a dynamic perspective on speech production with particular emphasis on 
its temporal aspects (see also Kelso & Tuller, 198Ka, 1984b). First, we will 
review some of . our recent research, which reveals that the relative timing 
among articulatory events is a significant index of the stable performance of 
the speech motor system across suprasegmental changes in stress and speaking 
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rate. This ubiquity of relative timing, not only in speech production but 
other activities as well, raises a number of issues about the role of time in 
biological systems (e.g., conventional questions about how the system meters 
ancj monitors time, how duration .is controlled, etc.). The whole question of 
time has been a subject of fascination Since the dawn of modern thought. 
Aristotle is said to have associated time with motion, yet also advocated M a 
soul which counts 0 (cf. Prigogine, 1984). In the second main part of the pa- 
per, we will argue that the soul that counts ( today* s temporal program?) can 
be usefully "replaced by a view' of time "that is Intrinsic to a sequence of 
events, and. whose units are df fined entirely in terms of the state variables 
of the system. ^ * 

Fowler (1977; see also Fowler, Rubin, Remez, & Purvey, 1980) has pro- 
posed an intrinsic timing account of speech production that, though appealing, 
has lacked an empirical methodology for its detailed evaluation. In the third 
• 'part of the paper, we will ground the notion of intrinsic time empirically by 
recasting our relative timing' data into a phase plane description of articula- 
tor trajectories, in which time itself does not appear explicitly. In our fi- 
nal remarks we will address, the advantage?*--both theoretical and experimen- 
tal — of this methodology in which^intrinsic time is revealed by the geometry 
of the system's dynamical behavior. This dynamical perspective stresses dif- 
ferent observables and motivates a simpler and more elegant account of rela- 
tive timing in speech productic 

2.0 Relative Timing of Articulatory Gestures 

Our basic intuition is that it is probably incorrect to assume, as 
conventional accounts do, that speech timing is based on standard 'temporal 
units, such as 'milliseconds. This intuition is strengthened by v^ell-known 
facts: namely, that the absolute duration of individual electromyographic, 
kinematic, and acoustic speech events can change dramatically as a function of 
speaking rate, syllable stress, and phonetic context, amonf other things, yet 
the perceptual identity of constituents is preserved (e.g. Fry, 1955; 1958; 
. Lehiste, 1970; Lindblpm, 1963). This suggests that time might usefully be 
measured in relative, rather than absolute, temporal units that are, in a 
sense, "normalized" to the activity being performed-. 

The notion that the relative timing of articulatory events provides a 
more appropriate metric than their absolute durations is mirrored by the im- 
portance o£ relative acoustic durations for perception of speech. For exam- 
ple, the duration of the interval between release of supraglottal occlusion 
and the onset of glottal pulsing, the so-called voice onset time, is a strong 
cue to the voicing category of a stop consonant (Lisker & Abramson, 1964). 
However, perception of the voicing category does not switch when some absolute 
interval duration of voice-onset-time is reached. Rather, the category bound- 
ary is perceived relative to the overall speech rate (Summerf ield, 1975; see 
also Port, 1979). Similarly, the duration of # f ormant transitions is a strong 
distinguishing cue between perception of /b/ and /w/ in word initial position 
(Liberman, Delattre, Gerstman, & Cpoper, 1956). But again, the absolute dura- 
tion of these transitions is less important than the transition duration rela- 
tive to syllable duration (Miller & Liberman, 1979). 

Other evidence that time is relative in motor control comes from analyses 
of non-speech activities in which the timing of individual events appears to be 
constrained relative to a longer, activity-.def ined period. This relative tem- 
poral stability of muscle activities, Nor kinematic events, is apparent over 
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scalar changes -in rate or force of production that often result in large 
variations in absolute duration. ' Although early demonstrations of relative 
temporal stability were provided from activities that are qualitatively repet- . 
iti and potentially pre-wired (e.g., locomotion, respiration, and mastica- 
tion; see Grillner, 1977, 'for review), more recent work has revealed that 
less repetitive activities show the same v organizational features (e.g., 
two-handed movements, typing, handwriting, postural control, and speech-manual 
coordination; Hollerbach, 1981; Kelso, Southard, 4 Goodman, 1979a, 1979b; 
Kelso, Tuller, & Harris, 1983; Lestienne, 1979; Nashner, 1977; Schmidt, 
1982; Shapiro, Zernicke, Gregor.,. & Diestel, 1981; Viviani & Terzuolo, 1980). 

In recent papers' we have presented data suggesting that the production of 
speech can be described by a similar style of organization (Harris, Tuller, & 
Kelso, in press; Tuller & "Kelso, 198 1 *; Tuller, Kelso, & Harris, 1982; 
1983). This demonstration should be of some interest to neuroscience and 
neuropathology, because it supports the. idea that a single set of organiza- 
tional principles may underlie very 'different motor skills, including ofte as 
'highly symbolic as human speech (e.g., see contributions by Grillner, Evarts, 
and Granit in Grillner, Lindblora, Lubker, & Persson, 1982; Ostry & Cooke, 
this volume; Ostry, Keller, & Parush, 1983). The data are also interesting, 
we believe, because they highlight the importance of a commensurate vocabulary 
for the symbolic description of an activity, and the activity itself, and 
offer an activity-sensitive metric'* for measuring Aiming in speech, production. 

In our work on speech production, we varied two suprasegmental aspects of 
speech that are believed to be particularly important— syllable stress and 
speaking rate— in an. effort to discover underlying articulatory invariance 
across speech segments. We approached this problem by examining kinematic and 
acoustic recordings of speakers' 1 productions of two-syllable nonsense utter- 
ances embedded in the carrier phrase "It's a again." The utterances under' 

discussion here were /ba#Cab/, and /bae#Cab/, where the medial C was from the 
set /b, p, w/. Half of the tokens were produced with primary stress placed on 
the first syllable and half were produced with primary stress on the second 
syllable. Twelve repetitions of each utterance were produced at each of the 
two stress patterns and at two self-selected speaking rates, conversational 
and somewhat faster. 

Kinematics of articulatory movements were monitored in the up-down direc-, 
1 tion using an optical tracking system that followed the movement of light- 
weight, infrared, light-emitting diodes (LEDs) attached to the subject's lips, 
jaw, and nose. In order to minimize head movements during the experiment, 
output of the LED on the nose was displayed on an oscilloscope placed directly 
in front of the subject, who was told to keep the display on. the zero line. 
The movement records were recorded on multichannel FM tape for later computer 
analysis.. For each tSken, displacement maxima and minima ^corrected for head 
movements) and the times at which they occur, were obtained individually for 
the jaw, the upper lip, and the lower lip. 

The spatiotemporal coordination among articulator events was analyzed to 
' determine whether stable relative timing of kinematic events is characteristic 
of speech. This analysis requires demarcation of some period of. articulatory 
activity and the Tatency of an articulatory event within the defined period. 
Although we examined relative timing of nine different articulatory events, 
linguistic evidence suggests that temporal stability among phonetic segments 
will be defined relative to the period between successive vowels. For exam- 
ple, when short silent intervals are inserted into, a sentence, listeners no- 
tice the insertions only when they disrupt the relative timing of the stressed 
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vowels (Muggins, 1972). Here we will discuss only the timing of consonant 
production relative to the interval between flanking vowels, by far the most 
stable .of our measures. Over linguistic variations, in this case stress and 
rate, these intervals change in their absolute 'durations. The ^question is 
whether they change in a systematically related manner. 

Figure 1, taken from Tuller and Kelso (1984), shows one subject's data 
for production of the utterances /babab/, /bapab/, and /bawab/. The data were 
similar for all four subjects. The x-axis represents the interval ( in ms) 
from* the onset of Jaw lowering for the first vowel to the onset of jaw lower- 
ing for the second vowel. The y-axis io the interval from the onset of jaw 
lowering for the first vowel to the onset of upper lip lowering for the medial 
labial consonant. In this figure, the jaw component has been subtracted from 
the lower lip v movement. Each point on the graph is one token of an utterance 
type. A Pearson product-moment correlation was calculated for each distribu- 
tion. The high correlations obtained (.97, .97, and .89) signify that the 
relative timing of thes.e articulatory events was maintained over the large 
variation apparent in their absolute durations. For utterances with /ae/ £s 
the first vowel, we find the sanje results as for the utterances displayed in 
Figure 1; the temporal changes are highly correlated (r - .90, .89, and .84, 
for medial consonants b, p, and w, respectively). 
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Figure 1. Timing of lower lip raising for medial consonant articulation as a 
function of the ygwel-to-vowel period for one subject's productions 
of the indicatM^utterances. Each point represents a single token 
of the utterance, ( # ) primary stress on the first syllable, spo- 
ken at a conversional rate; (o) primary stress on the second 
syllable, spoken at a conversational rate; (a) primary stress on 
the first syllable, spoken at a faster rate* (a) primary stress 
on the second syllable, spoken at a faster rate* [From Tuller & 
Kelso, 1984] 
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Although Figure 1« Illustrates the data from only a single 'subject, the 
three other subjects showed essentially the same pattern. Valuej obtained for 
the other* subjects by correlating tHe period between the onsets of successive 
vowel articulation ranged from to .97 whether the onset of consonant arti- 
culation was defined by the raising gesture of the lower lip or by the lower- 
ing gesture of the upper lip. In all cases,, the calculated correlatioh was 
significantly higher (p_s *< .05) than what would be expected solely on the 
ba3is of changes in vowel duration or part-whole correlations (cf. Barry, 
1983^ . Munhall, submitted; Tuller & Kelso, 1 9*84 ; Tuller et al., 1983). 
Note, however, that the relationship is not ratiomorphic— 1 ip latency varies . 
systematically with Jaw cycle deviation, plus some intercept value that 
changes with speaker and phonet-ixj context. This basic description has recent- 
ly been replicated and extended Y° tongue gestures produced by English speak- 
ers • (Harris et al., in press; Manhall,* submitted), and lip and Jaw gestures 
produced by speakers of French any Swedish (Gentil, Harris, Horiguchi, & Hon- 
da, 1984; Lubker,. 1983; see also Linville, 1982). Let us underscore again 
that this strong relatibnship holds even though other aspects of the move- 
ments, such as their displacement, velocity, and absolute du 'ation, change 
substantially. For example (in agreement with the acoustic/phonetic 
literature), the production of destressed ■ syllables shows smaller displace- 
ment, lower velocity, and shorter duration movements than the same phonetic 
segment spoken with primary stress (e".g., Harris, 1 971, 1978; Kent & Netsell, 
1971 ; Lindblom, 1963; ) MacNeiiage, Hanson, & Krones, 1970; Mermelstein', 
1973; Stone, 1981; Sufesman & MacNeilage, 1978; Tuller, Harris, & Kelso, 
1982). •> 

These relative timing results indicate, we believe, a functional con- 
straint on movement— a coordinative structure (cf. Easton, 1972; Fowler, 
1977; Kelso et al., 1979a, 1979b; Turvey, 1977) or unit of action 
(cf . Ghisel'in, 1981) — in which a system possessing a large number of potential 
degrees of freedom is compressed into one that requires few control decisions 
(Bernstein, 19i7). During a movement, the timing of individual elements is 
constrained within a particular relationship. Flexibility can then be at- 
tained by adjusting control parameters over the total unit. 

3.0 Intrinsic Versus Conventional Metrics for the Analysi&if Timing > 

The ubiquity of stable relative timing in so many different types of ac- 
tivities, including speech production, raises a number of fundamental ques- 
tions about the underlying basis of timing regularity in biological systems. 
For example, is the duration of each art iculatory movement^controlled directly 
via an extrinsically-imposed timing program, or i3 the duration of an articu- 
lator's movement a consequence of some other parameter, as yet unspecified? 
If not controlled by an extrinsic clock, what is the informational basis for 
the observed temporal stability? How, in a complex system of articulators, 
does a given articulator "know" when it should be activated in relation to 
other active articulators? With ' respect to our relative timing data, for 
example, what information is needed for the upper lip (a remote, non-mechani- 
cally linked articulator) to move in appropriate temporal relation to the jaw? 
Although an intrinsic timing theory of speech production has- been proposed 
(Fowler, 1980), a generalized methodology for. evaluating "intrinsic time" has 
yet to be offered. 3efore proposing such a methodology and applying it to 
experimental data, let us first clarify the basic notion of intrinsic time 
(cf. Richardson & Rosen, 1979). 
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Although it is convenient to measure time intervals between events as 
durations, this is purely a convention since, it can be argued and shown, time 
itself has no unique dimension. A few examples will clarify what we mean. 
Consider a p'rimitive clock, such as a candle. 'Time in this case corresponds 
to a change in a spatial variable, namely, the length of the candle that is 
burned. The units of time intrinsic to. this particular dynamical system are 
inches. Similarly, 'in a water^ clock, the unit of time corresponds to number 
of drops. We see vei^^uickly^from these simple but intuitive examples that 
time itself is not, strictly speaking* a fundamental observable; rather, it 
is intrinsically determined by the 'particular system ^involved. Thus, 
paraphrasing Richardson and Rosen (1979.), time is deniarcatecTby defining some 
state variable appearing, in the events* it is intrinsic to tihat sequence, 
that is, to a dynamical process, and tkkea its units from the system's state 
variables. This kind of intrinsic time is quite different from conventional 
time, which is imposed on a system regardless of > its particular dynamics. 
Conventional or mechanical time (measured; in seconds, hours, etc.) plays a 
role when, it is necessary to determine the relationship between the intrinsic 
times of two devices whose dynamic parameters, are very different. In such 
cases, a standard is introduced. For example, 1/86, HOC- of the earth's rota-' 
tion is, by convention, called a second. * Again, according to convention, all 
harmonic oscillators are calibrated in terms of seconds, not in terms of an 
event that is intrinsic to all harmonic oscillators, namely, the cycle. * 

We want to stress that calibration and convention, though important, are 
not the central issues of concern here, Our aim, rather, is to define an in- 
trinsic metric in terms of the dynamical behavior of a biological system, in 
this case that of speech production. The examples of the candle and the clock 
indicate how a' given system can generate its own intrinsic time entirely 
according* to its constitutive ' parameters. We recently came across similar 
sentiments expressed in. a' rather different context, namely irreversible 
thermodynamics and nonequilibrium physics. There, Prigogine (1984) uses the 
concept of interna?! time. In Prigogine f s words: 

0 

"to grasp the intuitive meaning of internal ^ime, think about a drop 
of ink in a glass of w,ater. The form the clrop takes gives \ps an 
idea of the interval of time that has elapsed. 11 (p. 6; italics 
ours). 

And later: 

"The internal time- T is quite different from the usual mechanical 
time, since it depends on the global topology of the system" (p. 7) 

The drop of ink, the candle, 4 and the water clock do not possess any knowledge 
of their own dynamics, nor do these systems contain an explicit representation 
of time. Jime evolves from the "playing out" of the dynamics, but there is no 
programming, time control, or time representation anywhere (see also Kelso & 
Holt, 1980; Kelso, Tuller, & Harris, in press; Kelso, V.-Bateson, Saltzman, 
& Kay, 1985). these terms are simply not descriptions appropriate to the con- 
cept of intrinsic time. 

H.O Grounding Intrinsic Time in the Geometry of the Speech System's Dynamics 

Let us now follow through with the claim that the notion of intrinsic 
time Is open to measurement and evaluation. This requires* that we character- 
ize articulator/ motion in terms of certain state variables (see Section 3.0 
above). To do this, we introduce a phase portrait methodology developed 
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"originally by Poincare' and Liapounov for many-body problems in dynamics, and 
employed by us and others for analyses of movements (e.g., Kelso et al., 
1985). The phase portrait capture's the forms of motion (a purely geomet- 
ric/kinematic description) produced by an underlying dynamic organization 
(cf. Abraham & Shaw, 1982). But first let us consider the standard represen- 
tation of our speech timing data (see Section 2.0 above). 

Consider the simple case we have described In which the latency (in ms) 
of onset of lower lip motion for a medial consonant is measured relative to 
the interval (in ms) between onsets of Jaw motion for flanking vowels. As we 
have shown, the two events ace highly correlated across rate and stress in 
different speakers. Although this strictly temporal description has been use- 
ful, it does not necessarily imply that the speech motor control system is 
keeping track of the duration uf articulatory motions. In contrast, prelimi- 
nary work suggests that the data can be understood without recourse to an ex- 
trinsic timer or timing metric ("more extensive analyses are currently under- 
way). Here we describe in genera^ terms how, using a phase portrait descrip- 
tion of articulatory trajectories on the phase plane, a very different view of 
articulatory "timing", emerges (see also Kelso et al., in press). 

The phase plane is the space of all possible states of the system, in the 
plane whose axes are the articulator's position (x) and its velocity (x). The 
position and velocity values act as coordinates of a point on the articulator 
in two dimensional space. As time varies, the point P (x,x) describing the 
motion of the articulator moves along a certain path on the phase plane. 

Figure 2 illustrates the mapping from time domain to phase plane trajec- 
tories. Hjfcthetical jaw and upper lip trajectories (position as a function 
of time) are shown for an unstressed /bab/ (Figure 2a, left) and a stressed 
/bab/ (Figure 2b, left). On the. right are shown the corresponding phase plane 
trajectories. In this figure and those following we have reversed the typical 
orientation of the phase plane so that position is shown on the vertical axis, 
and velocity on' the horizontal axis. Thus, downward movements of the jaw are 
displayed as do 'nward movements of the phase path. The vertical crosshair in- 
dicates zero velocity and the horizontal crosshair indicates zero position 
(midway between minimum and maximum displacement). As the jaw moves from its 
highest, to its- lowest point (from A to C in Figure 2), velocity increases 
(negatively) to a local maximum (B) then 'decreases to zero when the jaw 
changes direction of movement (C). Similarly, as the jaw is raised from the 
low vowel /a/ into the following consonant constriction, velocity peaks 
approximately midway through the gesture (d) then returns to zero (A). 

Note that time, although implicit and usually recoverable from the phase 
plane description, does not appear explicitly. For different initial condi- 
tions (such as starting position or the level of articulator stiffness) there 
will be Afferent corresponding paths, and the totality of all possible tra- 
jectories constitutes the full phase portrait of the system's dynamic behav- 
ior. It is useful to transform the Cartesian x,x coordinates into equivalent 
polar coordinates, naijely, a phase angle^ 4> » tan- 1 ' [x/x], and a radial ampli- 
tude, r ■ [x 2 + x2] 2 . These polar coordinates are indicated on the phase 
planes shown in Figure 2. The- phase angle will be a key concept in our 
re-analysis of interart iculator timing because it signifies position on a cy- 
cle of states (cf. Abraham & Shaw, 1982; Garfinkel, 1983). 

The phase plane trajectory preserves some important differences between 
stressed and unstressed syllables. For example, maximum lowering of the jaw 
for the stressed vowel is greater than lowering for the unstressed vowel and 
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TIME' SERIES 



JAW PHASE PLANE 



a. 



Upper ,ljp 
Jaw 





b. 



Upper lip 

Jaw fN 





C. 



Upper Up 
Jaw 
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Figure 2. Left: Time series representations of idealized utterances. Right: 
Corresponding jaw motions, characterized as a simple mass spring 
and displayed on the 'functional' phase plane (i.e., position on 
the vertical axis and velocity on the horizontal axis). Parts a f 
b, and c f represent three tokens with vowel-to-vowel periods (P and 
P f ) and consonant latencies (L and L f ) tha v are not linearly relat- 
ed. Phase position of upper lip movement onset relative to the jaw 
cycle is indicated (see text). 
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maximum articulator velocity differs noticeably between .these two orbits 
(e.g., Kelso et al., 1985; MacNeilage et al., 1970; Stone, 1981; Tuller, 
Harris, & Kelso, 1982). In contrast, the different durations taken to tra- 
verse the orbit as a function of stress are not represented in this descrip- 
tion. To reiterate, a crYicial point about this description is that duration 
does not appear explicitly. Jaw cycles of different durations are still char- 
acterized as a singlt orbit on the plane, i.e., they are topologically the 
; -ime. 

Now we can rephrase the question of how the lip "knows" when to begin its 
movement for the medial consonant by asking where on the cycle of jaw phase 
angles the lip motion for medial consonant production begins. One possibility 
is that lip motion begins at the same phase angle of the jaw across different 
jaw motion orbits (i.e., across 1 rate and stress). This outcome is not 
necessarily entailed, or predicted by, the relative timing results. For exam- 
ple, Figure 2a through 2c shows three utterances whose vowel-to-vowel periods 
and consonant latencies do not change in a linearly related fashion. 
Nevertheless, the phase angle at which upper lip motion begins relative to the 
cycle of jaw states is identical fln the three cases. Thus, the information 
for "timing" of a remote articulator (e.g., the upper lip) may not be time it- 
self, nor absolute position of another articulator (e.g.; the jaw), but rather 
a relationship defined over the position-velocity state (or, in polar coordi- 
nates, the phase angle) of the otfter articulator. Although this conceptuali- 
zation is intriguing, we want to re-emphasize that it constitutes an alterna- 
tive description of the relative timing data set. For example, Figure 3 - . 
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Two hypothetical utterances having identical vowel-to-vowol periods 
(P) and consonant (upper lip) latencies (L) but different phase an- 
gl<* > of upper lip onset. [See caption Figure P.] 
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illustrates th^ converse of Figure 2, namely, that two (hypothetical) utter- 
ances with identical vowel-to-vowel periods (P) and consonant latencies (L) 
■ can nonetheless show v§ry different phase positions for upper lip movement on- 
set. To be specific, the phase angle analysis incorporates the full trajecto- 
ry of motion; the relative timing analysis is independent of trajectory once 
movement has begun and is based only on the onsets and offsets of events. 

We also want to emphasize that the jaw motion need not be perfectly si- 
nusoidal in order to apply a phase angle analysis. In fact, the motions actu- 
ally observed are usually not sinusoidal; position at zero velocity is' 
' s affected by the stress and rate characteristics of the surrounding vowels (see 

Figures 2 and 4). For this reason, each Jaw cycle is normalized by determin- 
ing the minimum and maximum jaw positions for the consonant-vowel gesture. 
The midpoint between them is used as the best approximation of the equilibrium 
or zero position for that syllable. Similarly, jaw velocity (from zero to 
pjpak lowering velocity) is normalized for each cycle. 

Figure ^ shows jaw motion on the phase plane for the first syllable of 
/♦ba/Zbab/ (top) and /ba#'bab/ (bottom) produced at a fast rate. Esich token • 
shown is the first instance produced-, of the utterance type. On the left is 
the entire jaw cycle for each stress pattern; on the right, the jaw cycle is 
reproduced only until the point of onset of upper lip movement downward for 
production of the medial bilabial consonant, 5s measured from the first devia- 
tion from zero. velocity. The calculated phase position at which upper lip mo- 
tion begins is indicated for each token 1 . Notice that the jaw displacement and 
velocity are both greater for the stressed than the unstressed syllable. 
Nevertheless, upper lip motion begins at essentially the same phase angle for 
both tokens. If upper lip motion began at a phase angle*of 180°, it. would be 
synchronous with the jaw "turnaround" point. 

Figure 5 shows the mean data and the standard error of the mean from the 
same speaker whose relative timing data were shown in Figure 1 . The phase an- 
gle subtended, in degrees, is shown on the y-axis, scress-rate condition on 
the x-axis. a 2 X 2 ANOVA for each utterance type showed no significant main 
effects or interactions on the phase angle of upper lip onset for medial con- , 
sonant production. For /babab/, Fs (1,27) - 2.50, 0.03, and 1.85; for 
/bapab/, Fs (1,30) = 2.39, 0.01 , and 0.10; for /bawab/, Fs (1 ,29) - 0.06, 
1.16, and 0.61. F-ratios are for rate, stress, and their interaction, 
rospectively , £3 > .5. 

There are at least two empirical advantages of this result over our rela- 
tive timing description. First, in the relative timing analysis, the overall 
correlations across rate and stress conditions are very high, but the wi th- 
in-condition slopes tend to vary somewhat. In the phase analysis, on the oth- 
er hand, the mean phase angle is the same across conditions. Second, remember 
that the relative timing scenario was described by two parameters, a slope and 
an intercept. The phase description requires only a single parameter. Thus 
if nothing else, the phase description is to be preferred on grounds of parsi- 
mony. 

The phase conceptualization also has a number of theoretical advantages 
over our original relative timing analysis. First, once art iculatory motions 
are represented geometrically on the phase plane, duration is normalized 
across speaker, stress, speaking rate, etc. Strictly speaking, the system's 
topology Is unaffected by durational changes. For example, the lf down-up ff cy- 
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JAW PHASE PLANE TRAJECTORIES 
ba bab (Fast) 




VELOCITY (X) 



FUure 4. Left: Jaw cycle on the phase plane for the first token produced of 
stressed /bab/ (top) and unstressed /bab/ (bottom), spoken at a 
fast rate. Right: Jaw cycle until the onset of upper lip lowering 
for the second /b/. y ( j 
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cle of the phase plane description is independent of the duration required for 
the gesture. This potentially provides a grounding for so-called intrinsic 
timing theories of speech production (e.g. , Fowler, 1980; Fowler et al., 
1980). Second, neither absolute nor relative durations have to be extrinsi- 
cally monitored or 'controlled in this' formulation. There is no need to posit 
any kind of time-keeping mechanism or time controller. As . an aside, it has - 
never 'been clear how the speech system could keep track of time, ^t least pe- 
ripherally, .because there is no known afferent basis (such as time receptors) 
for time-keeping in the articulatory structures themselves (Kelso, 1978). On 
the, other hand, an informational basis (e.g., in position and- velocity 
sensitivities of muscle spindle and joint structures) is a physiological given 
in the phase angle characterization. It might well be the case that certain 

. critical phase angles provide information for coordination between articula- 
tors (beyond \those considered here) and/or vocal tract configurations, just as 
phase angles of the leg joints provide coupling information for lodomotory co- 
ordination (Shik & Orlovskii, 1965). Thus, the temporal orchestration of 
articulatory events in the speech motor system unfolds as a consequence of its 
dynamic parameters. By extension, it seems unlikely that the symbol structure 

' for speech production includes a specification of durational rules defined in 
'conventional mechanical time. As in a candle and a watch, time is not a pos- 
sessed, programmed, or represented property of >the speech production system. 
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A THEORETICAL MODEL OF PHASE TRANSITIONS IN HUMAN HAND MOVEMENTS* 
Hi-Haken,t J. A. S. Kelso, tt and H. Bunzt 



Abstract . Earlier experimental studies by one of us (Kelso, 1981a, 
.1984) have shown that abrupt phase transitions occur in human hand 
movements under the influence of scalar changes in cycling frequen- 
cy. Beyond a critical frequency the originally prepared 
out-of-phase, antisymmetric mode is replaced' by a symmetrical, 
in-phase mode involving simultaneous activation of homologous muscle 
groups. Qualitatively, these phase transitions are analogous to 
gait shifts in animal locomotion as well as phenomena common to oth- 
er physical and biological systems in which new "modes" or 
spatiotemporal patterns arise when the system is parametrically 
scaled beyond its equilibrium state (Haken, 1983). In this paper a 
theoretical model, using conoepts central to the interdisciplinary 
field of synergetics and nonlinear oscillator theory, is developed, 
which reproduces (among other features) the dramatic change in coor- 
dinative pattern observed between the hands. 

1.0 Introduction 

While researching voluntary oscillatory motions of the two index fingers, 
one of us (Kelso, 1981a) observed an interesting phenomenon. 1 Under instruc- 
tions to increase the. frequency of out-of-phase, antisyrametrical motion 
(involving simultaneous flexor and extensor muscle activities), the subject's 
finger movements shifted abruptly to an in-phase symmetrical mode that in- 
volved simultaneous activation of homologous muscle groups. This finding was 
not restricted to finger movements. In later work (Kelso, 1982, 1984) that 
employed similar experimental manipulations, modal transitions in hand motions 
around the wrist were also observed: the antisymmetrical phase relationship 
between the hands was replaced by symmetrical phasing. Moreover, although the 
phase transition occurred at very different frequencies of hand motion for 
different subjects, it was nevertheless predictable. When the transition fre- 
quency was expressed in units of preferred frequency, i.e., an independent 
measure of the rate at which each subject was content to cycle the hands "as 
if he/she were going to do it all day," the resulting dimen sionle ss ratio or 
"critical value" was constant for all subjects. Introducing""^ frictional re- 
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sistance to movement systematically changed both the preferred and transition/ 
frequencies for each subject, but did not change the critical value across aal 
subjects (Kelso, 1^84). * 

The most dramatic aspect of these simple experiments, addressed in detail 
in the present theoretical model, is the sudden and completely involuntary 
change in the ordering or phasing among muscle groups that occurs at a criti- 
cal, intrinsically defined frequency (see Figure 1). In this feature, the 
hand movement data share a likeness to gait transitions in locomotion. 2 For 
example, Shik, Severin, and Orlovskii (1966) showed that a steady increase in 
electrical stimulation to the midbrain region of the decerebrate cat was 
sufficient not only to • induce an increase in locomotion rate, but, above a 
certain value of current, gait shifts as well. Like the hand experiments in 
which "flipping" from one mode to another occasionally occurred at higher 
movement frequencies,' they too noted the presence of unstable regions in which 
the cat shifted from trotting to galloping and back again. Though the hand 




Figure 1. Bottom. Displacements over time of left (solid iine) and right 
(dashed line) hands. The subject is simply increasing cycling. fre- 
quency in an antisymmetric mode in response to a verbal cue from 
tne experimenter. Top. Phase relationship between the two hands. 
The peaks of one hand movement act as a 'target' file and. their 
phase position is calculated continuously relative to the 
peak-to-peak period of the other 'reference'* file. The graphic 
display repeats the phase curve so that phase lags and leada. can be 
noted. 
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tlata as well as these findings on quadruped gait strongly suggest that changes 
in coordination may be ordered by changes in a single parameter, the neural 
processes Underlying such motoric phase transitions are still poorly under- 
stood. As Grillner (1982, p» 22M) notes ' for the case of quadruped gait 
transitions, the general conception is that* there is a "switch mechanism" in 
which "coordinating fibers" serve to switch among hindlimb neural networks. 
But such coordinating f iberp "have yet to be identified neuroanatomically 
(Grillner, 1982) and tfieir exact functional role in determining locomotor pat- 



This problem of relating neuronal events to global patterns of behav- 
ior—in the present case abrupt macroscopic changes in the pKasing of neuro*- 
muscular activities and changes in characteristic quantities such as frequency 
and amplitude— is somewhat reminiscent of a similar problem confronting 
physicists about 50 years ago. After it was discovered that matter consists 
of atoms and after the properties 'of atoms were understood theoretically, it 
may have seemed straightforward to derive the macroscopic properties of matter 
directly from 'the properties of the individual atoms. It turned out, however, 
that such a goal could not be reached immediately and it proved extremely 
fruitful to" introduce macroscopic quantities for purposes of system descrip- 
tion. Only later did it become possible to derive the equations governing the 
macroscopic quantitfes by means of & microscopic theory (for review and exam- 
ples, see e.g., Wilson, 1979)* It has been shown quite generally in the 
interdisciplinary field of synergetics (e. g. , . Haken, 1983) that in many cases 
the behavior of complex systems can be successfully modeled by means of a^few 
macroscopic quantities in those situations where the behavior of the system 
changes qualitatively. Such macroscopic observables are called "order parame- 
ters" following a term first introduced by Landau (1936) to describe the "de- 
gree of order" (cf. Ter Haar, 1965, p. 208) of matter as it undergoes changes 
in phalTe.. In synergetics, however, which deals with cooperative phenomena in 
non-equilibrium, open systems, the concept of order parameter has added sig- 
nificance: not only is^ it created by the cooperation of the individual compo- 
nents of a complex system, but the order parameter in turn governs the behav- 
ior of these components (fbr many examples, see Haken, 1975). Even in physi- 
cal and chemical systems, finding the correct order parameter(s) is not always 
a simple matter. In the case of biological systems in general, and movement 
control and coordfhation in particular, the strategic approach of synergetics 
allows some license in selecting order parameters, an issue that we turn to 
next. 



To summarize, the main features of the experiments described briefly 
above are: (i) the presence of only two stable phase (or "attractor" ) states 
between the hands (which one is observed is a function of how the system is 
prepared, i.e., an instruction to move the hands in the out-of-phase or 
in-phase mode); (ii) the abrupt transition from one attractor state to the 
other at a critical cycling frequency; (iii) beyond the transition, only one 
modo (symmetrical in-phase) is observed; and (iv) when cycling frequency is 
reduced, the system stays in the symmetrical mode, i.e., it does not return to 
its initially prepared state-- a result that suggests coexistence of the basins 
of attraction for the symmetrical and ant isymmetrical modes and the depletion 
of one of them. Taken together, these results' as well as other findings in 




2.0 Initial Development of the Model: 
Order Parameters and the Potential Function 
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the motor control literature support the hypothesis that* phase is a relevant 
macroscopic (or "essential 11 ) . parameter of certain movement patterns. 5 For 
example, the internal phasing structure of activities as widely varied as 
chewing, locomotion, handwriting, * and speech remains invariant across scalar 
changes in force or rate (Gr illner , 1 9§2 ; Kelso, 1 981 b; Schmidt , 1 982 ; 
Tuller, Kelso, & Harris, 1982). Similarly t in the experiments described 
above, phase is preserved constant over a wide range of frequencies, even 
though, the magnitudes and durations of muscle activities and other kinematic 
variables change considerably. Only when frequency is scaled beyond a criti- 
cal value does a phase shift occur.- 

In the present paper it seems reasonable to propose phase as an order pa- 
rameter for at least two reasons. First, unlike many other possible 
candidates, phase is .an accurate reflection of the cooperativity among the 
components of the system. Thus, we can say, in a manner consistent with 
synergetics, that the configuration of the subsystems (in the present context 
defined as the individual hand motions) specifies their phase relation, and 
conversely, that the phase variable specifies the spatiotemporal ordering of 
the subsystems. Second, ^it is phase that remains invariant across transforma- 
tions in many motor ^activities that . involve very different anatomical 
substrates. This highlights an important further feature of the order param- 
eter concept, namely, that the order parameter (by hypothesis here, the rela- 
tive phase) changes much more slowly than the variables describing the behav- 
ior of the individual components (e.g., velocities of each hand motion). 




Figure 2. The displacements of x x and x 2 of the finger tips of the left and 
right hand in the symmetrical (in-phase, homologous) mode. 
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Our fi^st step in the development of the preset model is* to provide 
mathematically accurate description of the main qualitative 'features of the 
data. We therefore specify a potential funption that corresponds to 'the lay- 
out of attractor states and shov/ how that layout is altered as a control pa- 
rameter is changed. In a following section, we employ nonlinear oscillator 
theory to show how the model equations describing the potential function can 
be derived from the equations of motion of each hand and a (nonlinear) cou- 
pling between them. 

For sake of clarity we introduce- the elongations of the finger tips x l 

and x a as shown in Figure 2. In order to define the relative' phase <f> we 

assume that the motion of the hands is more or less harmonic (see Figure 1) so 
that we put' 



2.1 x i s r i cos(a)t + <j> x ) , 

2.2 x 2 - r a cos(a)t ♦ <fr 2 ) f 

where io is the basic frequency of the hand movement, while the amplitudes r lf 
r a and the phases <p x , <j> a are time dependent quantities whose time dependence 
is assumed to be much slower than that defined by ,the frequency w. The rela- 
tive phase is defined by 



2.3 $ $2 " 4>i • 

4 

In order to describe the change of phase we adopt basic ideas from synerget- 
ics. As shown in synergetics, in many cases the equations for order parame- 
ters are of the form 



where V is the so-called potential function. In our search for a model we 
make a few rather obvious assumptions about V. Since $ occurs under cosine or 
sine functions (cf. [2.1, 2.2] ) H the properties of the physical system must 
not change when <J> is replaced by <J> + 2tt. Consequently, we shall postulate 
that the potential V is periodic: 



2,5 VU + 2ir) « V(<J>). 

We furthermore introduce the assumption that both hands play a symmetric 
role.* In such a case the behavior of the system must not depend on the way 
we label the right hand and the left hand. This means that V must remain un- 
changed when we exchange the indices 1 and 2 in Equation 2.3. This in turn 
means that the potential V is symmetric: 



2.6 



V(4>) - V(-<J>). 
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Figure 3. The potential V (2.7) for b - 0. 




Figure 1. The potential V (2.7) for a = 0. 
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V 



We assume that V obeys the conditions (2,5) and (2.6) in the simplest form 
that explains the above mentioned experimental results. To this end we write 
V as a superposition of. two cosine functions: 



2.7 V - -acos<f> - bcos2<|>. 



As is known from synergetics, the behavior of the- system obeying the equation 
(2.*0 can be easily described by identifying <j> with the coordinate of a parti- 
cle which moves in -an overdamped fashion in the potential V. To illustrate 
this let us consider Figure 3 where b is put equal to 0. There is only one 
stable equilibrium position, namely at <j> « 0, When we t^ke a - 0, b f 0, the 
potential function looks like the one shown in Figure *4. Here we have two 
equivalent positions, namely at $ » 0 and <t> - tt (which is equivalent to <J> - 
-tt). When we take the total superposition (2.7) but change the ratio b/a we 
run through a series of potential fields shown in Figure 5. When we initially 
prepare the system in a state shown by the black ball and increase the fre- 
quency, and likewise assume that b/a decreases with increasing frequency, we 
obtain a critical value u> c where the ball falls to the lower minimum belonging 
to 0 = 0. This means that the hand movement made a transition from the anti- 
symmeti ic (<() = -it state) into the* symmetric state with $ » 0. The hand move- 
ment stays in that state when co is further increased. When we decrease u 
starting from high values, "the pystem remains all the time in the 0=0 state 
even if u> drops below w c . This "hysteresis" phenomenon is well known in many 
physical and biological systems. 




! 



Figuro Tht potential V/a for tho varying values of b/a. The numbers refer 
to the ratio b/a. 
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In order to study at which value of.b/a the transition occ 3, we seek 
the extrema of V that are defined by 

Using (2.7), (2.8) reads: 

2.9 -a sin$ - 2b sin2* =0. . 

The second term can be transformed by means of: 

2. 10 sin 2$ = 2sin* cos* 

> 

30 that (2.9) can be cast- into the form: 

2.11 -asin* - 4bsin*cos* ■ 0. 
One set of roots is given by: 



2.12 sin* = 0 j, 
namely . 

2.13 * - 0, * - ±ir. 

The other set of roots is given by: 

2.14 -a - 4b cos<fr = 0 

or, when 'we solve for cos*, by: 

2.15 cos* = - -~ ■ 

4b 

This value of cos* corresponds to the inner maxima of V. The transition oc- 
curs when these maxima vanish, which is the case if (2.15) can no more be ful- 
filled by a real *. This happens provided: 



2.16 



4b 



> 1 



or 



2.17 |b| < |a|/4 

i.e., the transition occurs if |b| drops below the critical value b Q . |a|/4. 
On the other hand, we know from experiments that the amplitudes r,, r 2 
decrease with increasing u. This suggests that b can be expressed by means of 
the amplitude r « r, = r 2 and a critical amplitude r Q so that we may write the 
potential function in the form: 
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2.18 w V - -a(cos<t>- + £^ eos2$) , 

c 

where r Q is defined as that value of r where the transition occurs. 

3.0 Further Development of the Model 

In the next step of our analysis we want to show how the model equations 
derived in the previous section can be derived from equations for the move- 
ments of the individual hands and a coupling between them. We write the cor- 
responding equations in the form: 

3.1 x^ ♦ fi(x |( x a ) - I l2 (x lf x 2 ), 

3.2 x 2 + f 2 (x 2 ,x 2 ) » I 2l (x lt x a ). 

The left hand sides describe the motion of the individual hands with ampli- 
tudes Xj and x 2 , respectively, while the right h^nd sides de-scribe the cou- 
pling. Of course, the coupling is achieved via the nervous system and in this 
way the equations (3.1) and (3*2) describe a complex system composed of the 
mechanical motions of the hands generated, in large part, by neuromuscular in- 
put. It is our goal to derive a minimal model for the macroscopic observables 
that are now the amplitudes and phases of the hand motion. Since the motion 
is basically oscillatory, we need at least a second order differential equa- 
tion so that the terms X lf K 2 occur. With respect to the restoring and damp- 
ing forces we have a certain repertoire at hand and in all likelihood the 
choice of f j and f 2 i8\ not unique. Since the hand movement has a more or less 
stable amplitude the equations must be nonlinear. We study several different 
examples. The first is well known from the operation of vacuum tube oscilla- 
tors, but here, of course, we shall use only its mathematical properties. Let 
us consider* the Van cjer Pol equation of the form: 

3*3 X + t(x 2 - r£)x + ax » 0, 

where e, r 0 are adjustable, but then fixed parameters, while a serves as a 
control parameter. In order to solve this equation for not too high ampli- 
tudes and in order to cast it into a form convenient for our later purposes we 
put: 

3.4 x - Ae i(Ait ♦ A*e~ iu)t , 



where = a, and the complex amplitude A can be time dependent. It is.a3- 
sumed, however, that its time dependence is much slower than that of e lwt . 
One can then perform two approximations well known in the theory of nonlinear 
oscillators (e.g., Haken, 1984). The "slowly varying amplitude approximation" 
means that we neglect terms A compared to terms u>A. The "rotating W$vg 
approximation" means that we may neglect terms containing e ilij} and & , 
compared to e iu>t and e wt . By means of these approx imat ions . ( 3. 3 ) is trans- 
formed into: 

3.'" e ita im(2A * e ( A | A [ 2 % - Ar*)) « 0. 

* 
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In the steady state the amplitude A is a constant and the only nontrivial 
solution reads |a| 2 - rf,. Thus the amplitude becomes frequency independent. 
In order to find a decrease of the amplitude with frequency we adopt a new 
model equation, namely: 

3.6 X ♦ e.(x 2 - u§rS)x ax - 0. 

0 

Making again the rotating wave approximation and the slowly varying amplitude 
approximation and using the hypothesis (3. 1 *) we readily obtain for (3.6).: 

3.7 e iuit iai(2Ax'e(3A|A| a a) 2 - Au) 2 r 2 )) - 0. 
In the* steady st^rte where A « 0, (3.7) has the nontrivial solution 

3.8 | A J 

Thus the amplitude indeed drops with oT l , giving us therefore a model equa- 
tion that describes both the oscillatory motions of the hands and a drop, in 
amplitude with increasing oj. ' • 

The experimental results suggest a 'superposition of a constant amplitude 
function (corresponding to El in eq. 3.9) and a function that decreases with u 
(corresponding to e 2 in eq. 3.9). That is, there is an intercept as well as a 
slope to the observed relationship between amplitude and frequency of hand 
movement. Such a behavior can be modeled by a superposition of (3.3) and 
(3.6): 

3.9 X ♦ [e^x 2 -^) ♦ e 2 (x 2 -u) 2 ru) 2 )] k ♦ ax - 0. * . 
This leads in the steady state to: 



t 

y r 0 u 0 , 




3.10 | A | 



2 _ (e, ♦ e a ui 2 )r 2 
(e, ♦ 3e 2 w 2 ) 



-, i 



As we shall 3ee, however, the main features of the phase transition can be 
modeled by choosing -3-6 as a basic equation. 

We now come to the central problem, namely to derive a suitable coupling 
between the two macroscopic quantities, i.e., the amplitudes x, and x 2 . The 
simplest hypothesis. would be a linear coupling of the form: 

1. 1 1 1 1 2 = a(Xj - x 2 ). 

However, as we shall see below such a coupl ing wil 1 not lead to the required 
potential V for the relative phase. Rather we have to add a nonlinear cou- 
pling. Requiring that this coupling term has the same symmetry properties as 
(j.llj we are led to a coupling term of the fol lowing M ind: 



J.Hi 



1 12 = o(Xj - x 2 ) ♦ 8(x , - x 2 ) 3 . 
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A detailed analysis reveals that suoh* a eoupling tern still lacks an im- 
portant feature, namely, it jioes not produce the correct phase relation be- 
tween the motions of the individual hands. We have to introduce the coupling 
term either via a time delay or by using time derivatives. We first study the 
coupling by a time delay. This can be achieved by averaging over past values 
of (3.11a) so that we can replace (3. 11 a) by: 

"3.11b I ia - ../*«*»■- x a ) t ♦ B(Xl - *,)«3«"? (t " t) dr. , 

An equivalent formulation is obtained by the requirement that I 12 obeys the 
differential equation: 

3.11c "l ia ♦ YI ia - a(x, - x a ) ♦ 6(x t - x a )». 

In order to facilitate the subsequent calculation we shall assume that Y is 
much smaller than u. This assumption is not all that crucial, however, since 
it does not change the basic structure of the equations. In order to proceed 
further we differentiate equations (3.1) and (3.2) with respect to time. Mak- 
ing use again bf the slowly varying amplitude approximation and the rotating 
wave approximation, equation (3.1) acquires the form: 

3.12 -u» 2 (2A 1 ♦ e<3A,|A»1»«« - A lW Jr|» - oA, ♦ 30A,|A,| a ♦ K la . 

The first two terms on the right hand side of (3.12) can be absorbed into the 
terms on the left hand side containing the factor e and do not alter qualita- 
tively the behavior of the system. The term K l2 specifies the coupling influ- 
ence of oscillator 1 on oscillator 2, corresponding to the motions of the two 
hands. In the above mentioned approximation K ia reads: 

3.13 K ia - -oA a - 3B(A*A* ♦ 2|A 1 |»A a ) + 3B(2A 1 |A a | a + AfAf) - 30A a |A a |*. 

We are now in a position to show how our model equations (3.1), (3.2) with the 
specific choice (3.11b) allow us to derive the order parameter equation (2.4). 
To this end we make the hypothesis: 

3,111 A j " r j ei * j » J " 1 ' 2 ' 



where ar d ^ may be time dependent, which transforms (3.12) into: 

3.15 " e^'t-w'Ur, ♦ 2i* l r 1 ♦ e (3u) 2 r? - w'r'r, )}, -ar, - 3Br»] « K l2 . 
K l2 acquires the form: 

3.16 K 12 = -ar^ 1 *' - l&r\r 2 (2e L ** * e 2i *»." ♦ 3Br l r|(2e i,,, » ♦ 

Similarly the equation for oscillator 2 contair.3 the coupling term: 
3.17- K al - -arje 1 ** - 3Br|r l (2e i *» + e 2i ** * *4>i) + 

3Br 2 rJ(2e i *^ + e 2i *» " l *a) - ser^e 1 **. 
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We divide equation (3.15) by -u^re 1 ** and consider its imaginary part, 

3.U i t > - Imte- 1 ?^^), ! 

. js . • • . 

. • \ ' 

where g(r) - 2u» a rj. ' ' . 

« < 

Applying the analogous procedure to oscillator 2 and taking the difference of 
the two equations for $ 4 , $ a we obtain after a small intermediate calculation 

3.19 | " "gfr)-* (otr * 6Bp,) 8in * ' 3ftr* sin2*]. 



We have assumed that r - r, - r a and is well-stabilized, so that r is practi- 
cally time independent. If the assumption "Y << u> is dropped, then g(r) is re- 
placed in eq. (3.19) by g(r) - 2(w a +Y a )r. As we shall see 8 must have a sign 
opposite to a in order to obtain agreement with experimental findings. There- 
fore we put: . 18 

3.20 b - -a: 

Thus, we are left with our final equation: 

3.21 ) - - ^ ) [(or - 6Br*)sin«|» ♦ 36^3^24.] , 

which indeed has the required structure of the order parameter equation (2.1) 
with (2.7). However, we are now in a position to relate the coefficients a 
and b to the amplitude r. The phase transition* takes place for: 



3.22 |b| -M I 



if 4 | b | > |a| bistable 



if 4|b| < |a| monostable 

(cf. 03.16]). Comparing the coefficients a and b with those occurring in 
(3.21), enables us to cast (3.22), into the form: 

■> ?! 1 0r* « I (a - 6 Br «) 

j.^j 2 c 4 c - • 

or, after a little algebra, into 



3.2H r* 
c 



120 



We thus find that bistable operation, particularly in the antisymmetric mode, 
cjcurs when r 2 fulfills (3.24). In the other case, with decreased amplitude 
the system becomes monostable and operates in the symmetric mode. 
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As mentioned above, there is still another possibility of defining I l2 , namely 
by means of time derivatives. In the sense of a minimal model we choose: 



3.25 I 4a - (x, - x a ) • (o ♦ 0(x, - x a ) 2 ). 



Again, to the same degree of approximation as used in eqs. (3.11 )-(3.21 ) we 
obtain the equation for the phase: 

3.26 $ - ( a + 26r 2 ) sin$ - &r 2 S in2$; 

i 

The critical amplitude is then given by: 

3.27 r* m a . 

ilB 

In the transitions generally studied in synergetics, fluctuating forces play 
an important role. Extrapolating to the present case, a transition, say from 
$«tt to $-0 can be initiated only if fluctuating forces, F are present. To 
this end we enlarge the equations ( 3 • 1 1 3.2) to include such forces, so that 
these equations now read: 

3.28 a t ♦ ^(XpXj) - I ia (x ll x a ) ♦ F^t), 

3.29 X 2 ♦ f 2 (x 2 ,x 2 ) *I zl (x l9 x 2 ) ( ♦ F 2 (t>. 

In the context of the present paper it suffices to assume F. f J-1,2 as a ran- 
dom "small variable, which can be easily mimicked on a digital computer. At 
present, we cannot say much about the source of these fluctuations from exist- 
ing experimental data. However, ongoing experimental work 4n which fine-wire 
electrodes are inserted into the finger muscles involved, is exploring their 
possible neuromuscular origin (see also Goodman & Kelso, 1983, for evidence 
pertaining to the relationship between physiological tremor "fluctuations' 1 and 
voluntary movement). 

H . 0 Numer i ca 1 Rgsu 1 ts 

In this section we^present some numerical results that correspond to the 
analytical treatment provided above. We solve the minimum model given by 
eq. (3*6) along with the coupling (3-25) on a digital computer using a fourth 
order Runge-Kutta method. To test the stability of a stationary solution 
small random fluctuations of finite amplitude ar^ introduced. The resulting 
simulation shown in Figure 6 ( )mpare3 quite favorably with the experimental 
data (e.g., Figure 1). In Figure 6a the displacements x l9 x 2 are plotted over 
time and in Figure 6b the corresponding phase difference between the oscilla- 
tors is plotted for the same motions. As in the bimanual experiments, the 
coupled oscillation is prepared in the state 4>*tt and the frequency w is in- 



57 



53 



Haken et al.i A Theoretioal Model of Phase. Transition in Human Hand Movements 



B. 



A $ 



7T 



-Of 



9 

ERIC 



Figure 6, 
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A numerical simulation of the phase transition in voluntary cycli- 
cal hand movement. In Figure 6a the displacements of the oscilla- 
tors and in Figure 6b the corresponding phase difference between 
the oscillators is plotted over time. The parameters of the 
eq. (3.6, 3.25) were fixed al e-1 . u>o r H» a— 0.2, 8-0.2. From the 
left to >the right of the displays, u changes from u>-1.17 to w=3.05. 
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creased monotonously. A transition from the out-of-phase mode to the in-phase 
mode is observed, when u> exceeds a critical value. However, the frequency of 
the oscillation changes rather quickly so that stationary oscillations are not 
reached. Thus the exaot form of the curves depends strongly on the noise lev- 
el and the rate of changing w. , 



The steady state amplitudes for the in-phase mode and the out-of-phase 
mode are shown in. Figure 7. The unstable branch of the out-qf-phase mode is 
shown by dotted lines. The or 1 dependence of the amplitudes is quite clear. 
This feature is exhibited only by the simplified model equations and will 
change if eq. (3.9) is used. As shown in Figure 7 for « smaller than u> , the 
in-phase mode and the out-of-phase ' mode are both stable. Due to the 
coexistence of two basins of attraction, the particular mode observed depends 
on the initial conditions, i.e., which coordinative state is prepared. 



o . ao -- 



O . 40 




l 40 



CJ 



Figure 7. The steady state amplitudes of the in-phase mode (1) and the 
out-of-phase mode (2) are shown as a function of u. The other- 
parameters are fixed at the same values as in Figure 6. Stable 
branches of the oscillations are shown by the solid lines, and the 
unstable branch by the dotted line. 
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If one starts in the antisymmetric phase ana increases u> slowly, the 
oscillation remains in this mode until the solution becomes unstable. At this 
point a jump in amplitude _j>ccurs and the only stable stationary solution 
revealed by the system corresponds to the in-phase mode. Such, is the case 
when w is increased further. On the other hand, if u is decreased slowly the 
system stays within the basin of attraction of this Solution even when u> drops 
below w . As we mentioned earlier, this hysterisis -phenomenon is typical for 
such bistable situations. To summarize, it is* quite clear that the main fea- 
tures of the experimental data described at the beginning of Section 2.0 are 
captured by the present mathematical formulation as illustrated by these 
numerical results. 

f 

5.0 Concluding Remarks 

In this paper we have introduced a minimal theoretical model that 
reproduces a number of- the observed facts. The hand movements are described 
v by two nonlinearly -coupled oscillators that are self-sustained, i.e., not 
driven from* the outside.. The assumption of autonomous limit cycle oscillators 
is quite consistent with perturbation studies of two-handed cyclical move- 
ments, showing that an unexpected perturbation to one hand does not disrupt 
the phasing relation between the hands. The perturbed hand returns to its 
limit cycle almost immediately (see Kel.so, Holt, Rubin, & Kugler, 1981 j Yama- 
nishi, Kawato, 4 Suzuki, 1980). Similar results in very different prepara- 
tions (e.g., Cohen, Holmes, & Rand, 1982; Willis; 1980, for reviews) have al- 
so led to limit cycle models of neural pattern generation. - 

In the present model the frequency w is defined as a control parameter 
via the coefficient, a, of the restoring force. The model describes not only 
the observed decrease in hand movement amplitudes with increasing frequency u, 
but, more importantly, the phase transition, i.e., the change of qualitative, 
behavior from antisymmetric ' to symmetric hand movement. A relation that 
automatically results from the equations is that the transition takes place at 
a critical frequency via the amplitudes. This prediction is now open to 
further, experimental test. In future studies a number of phenomena, known to 
accompany phase transitions in synergetic systems (e.g., critical slowing 
down; critical fluctuations) will be analyzed. 

For the moment, any speculation on the origin of the coupling between the 
two hands is certainly premature. One coupling may be established via the 
corpus callosum, the well-known band of fibers that joins the two hemispheres 
of the brain. On the other hand, recent experiments duller & Kelso, 1984) 
with patients whose corpus callosum has been severed, effectively cutting off 
communication between the left and right cerebral hemispheres, show that even 
in this case, control of the two index fingers in cyclical tasks is not inde- 
pendent. When asked to follow two pacing lights whose phase was varied be- 
tween synchrony and alternation, split-brain subjects produced predominant 
synchrony or alternation even when paced at intermediate phase values. This 
bias in intermanual phase toward temporal symmetry is extremely powerful (see 
e.g., Kelso, Southard, & Goodman, 19"/9; Yamanishi et al., 1980, for evidence 
in normal populations) and suggests that the neural coupling for the voluntary 
hand movements may be established subcortically . 

In conclusion, although we have shown here how a transition from one mo- 
dal configuration to another is possible in our model, it remains for further 
theoretical and experimental research to address how it is that only two sta- 

69 



9 

ERIC 



Haken et al.: A Theoretical Mpdel of Phase Transition in Human Hand Movements 

p — 

b/le modes emerge in the first place from a wealth of possibilities, i.e., how 
these particular cooperativities arise. What is clear, however, from the pre- 
sent analysis, borne out by our numerical results, is the need to characterize 
the Individual oscillators as nonlinear. But more important the coupling be- 
tween oscillators must be nonlinear for the phase transition to occur. Al- 
though the* present formulation clearly points to the important role of 
nonlinearities in certain basic motor behaviors (i.e., the frequency-amplitude 
relation in individual hand movements, modal transitions f between the hands), 
the physiological underpinnings of such nonlinearities remain an open issue. 

On the other hand, tfhouglv their physiological basis may be obscure at 
present, it is entirely reasonable to inquire how a complex neuromuscular sys-. 
tem might exploit these nonlinearities.' Why are they important attributes for 
a neural control system to possess? What are they for? First, nonlinearity 
affords a stable coupling between the fundamental physical variables of space 
and time (i.e., the amplitude-frequency* relation) . In a* linear system no such 
preferred coupling exists between these variables. Second, nonlinearity pro- 
vides a means by which switching among^ coordinative states is possible (though 
other properties, e.g., fluctuations, play a key role also). In principle, 
there is no reason to limit this conclusion to the two phasing relations stud- 
ied here. « Thus, both of these attributes, we hypothesize, guarantee — in the 
present context — stability and flexibility of motor function. , 
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Footnotes 

'*In discussions of these experiments with D. Shapiro (Shapiro, 1981, per- 
sonal communication) it was learned that Cohen (1971) observed occasional in- 
voluntary shifts into the in-phase coordinative mode when out-of-phase motions 
at a single cycling frequency (3 Hz) were required. Cohen did not, to our 
knowledge, examine the phenomenon further. Similar experimental 'findings on 
bimanual finger movements have been reported by MacKenzie and Patla (1983) and 
by Baldissera, Cavallari, and Civaschi (1982) on ipsilateral hand and foot 
movements. 

2 Indeed, it was the slogan "Let your fingers do the walking" promoted by 
advertises of the Yellow Pages in U.S. telephone directories, that led to the 
idea behind the present experiments. 

3 The reader must be warned that the word "phase" in the context of tfjis 
paper has two different meanings: 
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1) . "phase" as a temporal relationship whose precise definition is given 

in 2.1-2.3. 

2) "phase" as a state of aggregation of matter (e.g., liquid or solid) 
or, more generally, different modes of behavior. Therefore,, in phys- 
ics, "phase transition" means transition from one state, e.g., fluid, 
to another one, e.g., solid, Jn synergetics, transitions between 
different dynamic states (e.g., behavioral modes) are also called 
phase transitions. » • ♦ 



Since in the present paper the behavioral modes are characterized by defini- 
tion (1), the notion "phase transition" is unique— in spite of the double me- 
aning of "phase." 

-This becomes obvious when we' change the origin of 'time so that u>t ♦ $ x - 
u>T. In this case x x - r x cos (un), x 2 — r a cosUt+^j-^). 

"Our model can easily be generalized to include asymmetries. 
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REPETITIVE NAMING AND THE DETECTION OF WORD RETRIEVAL DEFICITS IN THE ' 
BEGINNING READER* 



Robert B. Katzt and Donald Shankweilertt 



Abstract. The claim has been advanced that children with severe 
reading disability are generally deficient in word retrieval com- 
pared with normal readers. Support for the claim is based largely 
on studies of rapid ' naming of repetitively presented pictured 
objects or other nameable stimuli , a task that is apparently more 
sensitive to retrieval problems than the confrontation naming of 
items presented singly. The purpose of this study was to examine 
whether there is a general relationship between word retrieval speed 
and reading ability in beginning readers. Although such a relation- 
ship has not been detected with ' confrontation naming, repetitive 
naming may provide a more sensitive test. Accordingly, second-grade 
children were required to name as rapidly as possible repeated 
presentations of five pictured items drawn from a single category. 
Separate naming tests were made for objects, colors, animals, let- 
ters, and. words. The results showed that there was no relationship 
between reading ability and naming times when the test items were 
selected from sets of objects, colors, or animals, whereas on let- % 
ters and words, a significant relationship was found. The 
less-skilled readers were not, therefore, consistently slower in all 
repetitive naming situations. Instead, their word retrieval defi- 
cits extended only to the orthographic materials. 

It has often been claimed that many elementary school children with read- 
ing disorder experience word retrieval problems', a difficulty they share with 
most adult aphasics (e.g., Goodglass, 1930; Howes, 1964). Support for \he 
claim derfves largely from .studies of children f s performances on object-naming 
tasks. The most widely used procedure for testing naming is by a so-called 
confrontation naming test. The subject is presented with objects one at a 
time and is required to name each item as it appears. Generally, each pic- 
tured object is presented only once. Reading-disabled children have been 
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found to make a -greater number of errors on. this task than normal readers 
(Denokla & Rudel, 1976a; Jansky & deHirsch, 1973; Mattia, French, & Rapin, 
1975). 

Other studies, however, have failed to find evidence of naming problems 
in poor readers. Response times of leas-skilled readers have been found not 
to differ from those of skilled readers on objects, colors, or digits (Per- 
/etti, Finger, & Hogaboam, 1978; Stanovich, 1981); less skilled readers 
responded as quickly and as accurately even in studies using letters as stimu- 
li (Stanovich, 1981; Wolford 4 Fowled, "19831. The choice of subjects may 
partially account for these discrepant findings. -..The first-mentioned studies 
obtained subjects from learning disability clinics, whereas the latter 
recruited poor readers from ordinary school classes. Thus, it is possible 
that the first studies tested children with more severe reading disability and 
more severe associated cognitive deficits than the second series. The task 
employed may also account for the variability in the results. It is possible 
that naming deficits may exist even in the more moderately Impaired poor read- 
ers, but that .confrontation naming tasks are not sensitive enough to detect 
them reliably. 

* 

This possibility is suggested by studies using another procedure. An al- 
ternative naming task involves continuous rapid naming of a small set of re- 
peated pictured items, typically drawn from a single category. The task 
requires the subject to scan the display of pictures arrayed in horizontal 
rows and to name each picture in succession as rapidly as possible. Each item 
occurs several times and at various positions in the display. Such a repeti- 
tive naming test was used by Denckla and Rudel ( 1 976b) to compare normal read- 
ers with a group of severely disabled readers selected from special school 
programs and clinics. The results indicated that the overall response times 
of these poor readers were longer than those of normal readers on every cate- 
gory of item tested (objects, digits, colors, and letters). , 

It is apparent that the two types of naming tasks are quite different in 
the demands they make, and that each provides us with different information. 
The repetitive naming task is the focus of our interest here because of the 
discovery that response times on this task reliably dif ferjentiate normal and 
poor readers even on producing - response words of high frequency (Denckla & 
Rudel, 1976b). These findings raise the possibility that some less-skilled 
readers have a general problem in word retrieval that could not have been dis- 
covered using the apparently less-sensitive confrontation naming test. But to 
date, only one study (Blachman, 1981) has tested repetitive naming with a 
whole (first-grade) school class. It was found that color naming and letter 
naming correlated with reading ability, but object naming did not. However, 
the age of the subjects in the Blachman study limits the conclusions that can 
be drawn. Tested in the first grade, these children may not have been old 
enough to permit the reading problem cases to be identified. 

/ 

The purpose of the present study was to examine whether there is a rela- 
tionship between naming times on a repetitive naming task and reading ability 
among second-grade children selected from ordinary school classes. Naming 
ability was tested with tne following categories of items: pictured objects, 
pictured animals, colors, letters, and words. If the less-skilled readers 
have consistently longer naming times than the skilled readers, then they may 
have general word retrieval deficits. If, on the other hand, a relationship 
between reading ability and naming time holds only for selected categories of 
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items, then it may be possible to delimit possible explanations for why the 
le33~skilled readers are slower at repetitive naming of these types "of test 
materials and not others. In the latter case, some potentisil explanations, 
such as differential familiarity with the classes of stimulus items, should be 
considered. Other explanations, based solely on the mechanical aspects of re- 
petitive naming, such as efficiency of scanning, could be ruled out. 

Method 

Subjects 

The subjects were the 18 children from two second-grade classes in a sub- 
urban Connecticut public school, for whom parental permission for testing was 
granted. Of these, two were excluded from testing because English was a re- 
cent second language* The remaining 16 children were given the word identifi- 
cation and word actack subtests of the Woodcock Reading Mastery Tests (Wood- 
cock, 1973) and the Peabody Picture Vocabulary Test (PPVT) (Dunn, 1959) near 
the end of the school year. The eight children (6 females, 2 males) with the 
highest combined raw scores on the two Wqodcock subtests were designated the 
skilled readers*. 1 The eight remaining children (5 females, 3 males) were 
designated the less-skilled readers. 

the skilled readers had a mean combined Woodcock score of 140.2 (range: 
129 to 1 64) ,/ compared with the less-skilled readers 1 mean of 92.6 (range: 50 
to 128). Moreover, the two reading groups achieved significantly different 
scores on each of the two component parts. On the word identification subtest 
(which tests the reading of single words) alone, the skilled readers 1 average 
reading grade level was 3.9 (range: 3.3 to 5.6), whereas the less-skilled 
readers* average was 2.8 (range: 2.2 to 3.6), t/14) - 4.1, £ » .002. On the 
word attack subtest (which tests the reading of pseudowords) , the skilled 
readers 1 mean reading grade level was5.8 (range: 4.1 to 12.9), compared with 
the less-skilled readers' mean of 2.6 (range: 1.2 \to 6.1), t04) « 4.6-, 
£ < .001. IQ scores derived from the PPVT yielded the following results: the 
skilled readers obtained a mean IQ of 118.6 and the less-skilled readers 94. 4, 
tO 4) » 4 # .7, £ < .001. A difference in the mean age of the members of the two 
groups was evident also; the skilled readers 1 mean age was 7 years, 9 months, 
whereas the mean for the less-skilled readers was 8 years, 4. months, 
t(14) * 3.9, £ « .002. In anticipation of the results, we should mention that 
the differences in IQ and age, which distinguished the two reading groups, 
were apparently of no consequence. ; 

Material s 

The te53t stimuli (printed words, letters, colored squares, and line draw- 
ings of objects) were arrayed in consecutive rows in a manner similar to th^t 
of Denckla and Rudel (1974, 1976b). Each array consisted of five items of a 
single category, each of which was presented a total of 10 times in a matrix 
of b rows of 10 on white cardboard. The order of the items was random with 
the constraints that no item immediately succeed itself and that each item ap- 
■ >e \r twice in every row. The categories, and tno specific stimulus items in 
<Moh, were: 1) line ^r^wings of animals ( bird , cow , dog , goat , pig ) ; 2) 
objects ( b al 1 , box , door , hat , table) ; 3) colors ( blue , gree n , red , yellow , 
bl a ok ) ; *J ) lower-case letters f a t d, o, p, s): and b) common words ( ba 1 1 , 
box, door , hat , tabl e ) . For each chart, 'j practice sequence consisting of the 
Hv<> item:; was constructed. 
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Procedure 

Each child was tested individually in one 30-min session. At the begin- 
ning of the session, the child was given the Woodcock and PPVT. Following a 
brief memory test, the naming tests were given. The charts were always 
presented in the following order: objects, colors, letters, words, animals. 
Before being tested on a chart, the child was asked to name the five items in 
the practice sequence in order to ensure that standard names could be elicited 
to each item. Every ihild was able to do this without hesitation. 2 Then the 
subject was instructed to name each item on the chart as rapidly as possible 
without making any mistakes, following along the rows from left to right. The 
child began the task on a signal from the experimenter. The experimenter 
.responded to hesitations with the injunction to "Keep going." A stopwatch was 
used to measure the time from the child's first response to the last response. 

Results 

The first point co be noted is that naming errors "occurred infrequently, 
as was expected. Accordingly, the data base consisted of the mean naming 
times on each class of items. These are shown in Table 1. It can be saen 
from the table that the naming time varied considerably with item type. The 
order of .response times across item types is generally consistent with that of 
previous repetitive naming studies (Biemiller, 1977-1978; Blachman, 1981; 
Denckla & Rudel, 197^, 1976b). Moreover, it is a standard finding that it 
takes longer to identify pictured objects than to read the" objects' names. 
Thi3 result has been obtained both with adults (Cattell, 1886; Potter & Faul- 
coner, 1975) and with children (Ligon, 1932; Seymour & Porpodas, 1980). It 
is of interest to discover that it applies even to readers so near the begin- 
ning stages of skill acquisition. Examining the effect of reading ability, we 
note that there was only a small overall difference in mean naming time be- 
tween the skilled and the less-skilled readers. This was due to the fact that 
the less-skilled readers tended to be faster than the skilled readers • on 
objects and animals, but slower on letters and words. 



Table 1 

Moan Naming Time (sec) for Each Item Type by Reading Level 





Skilled 


!.ess-3k i ] led 


Type 


Readers 


Headers 


An i ma Is 


53-3 


50.? 


Qbjects 


56. 2 


5,'. 1 


Colors 




^6.0 


Letters 


3 


31 .9 


Words 


:>9 . i 




Mean 


'H.9 
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The data were subjected to an analysis of variance with one be- 
tween-groups factor (reading ability) and one wi thin-groups factor (item 
type). The analysis** ind icated that the main effect of reading ability was not 
significant, F < 1, whereas the main effect of item type was highly signif- 
icant, F(4,56) - 7^. 3i £ < .001. Furthermore, the interaction between reading 
ability and item type was significant, F(4,56) - 3.5, £ w -013. Fine-grained 
analyses of this interaction were conducted using protected t^-tests (Cohen & 
Cohen, 1975). These analyses indicated that for objects, animals, and colors 
the naming times of skilled and less-skilled readers were not significantly 
different: animals, t(11) - -.6, p > .5; objects, t(H) * -.8, £- .409; 
colors, t(1*l) - .3, P> -5. In contrast, significant differences were found 
on both fetters, t(l4j « 2.5, £ - .026, and words, t(11) « 2.8., £ « .015. 3 ■ 

It could be maintained that the analysis of variance obscures a general 
relationship between reading ability and naming time, since the children were, 
divided into only two levels of reading ability, thus eliminating fine 
distinctions in actual reading skill. A correlational analysis was therefore 
carried out to assess the degree of relationship between the children's actual 
reading scores and their naming times. These correlations, which are shown in 
Table 2, form two groups of clustered variables. The first cluster consists 
of animals, objects, and colors; the second is formed by the reading score, 
letters, words, and colors. The variables in the first cluster have little 
relationship with reading score. Moreover, with the exception of colors, the 
variables in one cluster correlate relatively little with those in the other 
cluster. The correlations clearly indicate, as did the previous analyses, 
that there is a strong relationship between reading ability and the naming 
times for letters and words, but not for objects, animals, and colors. 



Table 2 



Correlations between Naming Times and Reading Score 



Variable 


1 


2 


3 


i\ 


5 


6 


1 . Read ing Score 




.01 


.12 


-.07 


-.58* 


-.70** 


2. Animals 






.87*** 


.87*** 


.13 


.31 


3. Objects 








.71** 


.42 


.30 


Colors 










.52* 


.34 


5. Letters 












.62* 



6. Words 

*p < .05 
**p < .01 

**»£ < .001 

„ T 

Discuss ion 

; purpooc of the present experiment was first to examine whether word 
['.•♦.ri'wil ability (independent: of the reading process per se) is related to 
r"t-.:ir,^ ability. Second, we wished to discover whether any difficulties that 
~i*;\t t> ° found are of a general nature or are circumscribed, becoming manifest 
ifi thf retrieval of only certain oia3:;e3 of stimuli. The results of our study 
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of rapid naming of repetitively present items indicated that the less-skilled 
readers were slower than the skilled '•readers at naming letters and wo.-ds, but 
responded as quickly on objects, animals, or colors. Thus, the less-skilled 
readers were slower only on orthographic material. 

The pattern of results obtained in this study raises two questions. 
First, why are less-skilled readers slower than skilled readers at repetitive 
naming of orthographic material but not on pictorial material? Because 
less-skilled readers responded as quickly as skilled readers on som classes 
of items, we can rule out those arguments that invoke reading-related differ- 
ences in scanning rates, response strategies, . use of peripheral information, 
response interference, or visual-verbal association. Similarly, although the 
less-skilled readers had lower IQ scores than the skilled readers, this rela- 
tionship cannot account for the results. The lower IQ scores would have been 
expected to lengthen the naming times of the less-skilled readers on all five 
classes of stimuli- Although it is possible that lower IQ scores led to long- 
er naming times on letters and words, other findings make this implausible. 
Other tests of repetitive letter naming using skilled and less-skilled readers 
with equivalent IQ scores (Biemiller, 1977-1978; Staller & Sekuler, 1975) al- 
so found an effect of reading skill on naming time. 

One might also suppose that the obtained response time differences may be 
a function of familiarity with stimulus items. Since the less-skilled readers 
have read less extensively, they .may well be less practiced than the skilled 
readers in identifying letters and printed words; this is less likely to be 
the case with objects, animals, or colors. Such experience-related effects 
could be expected to work against the less-skilled readers when tested on let- 
ter naming and word naming. It may be the case that differential familiarity 
with letters and words creates differences in the attentional resources needed 
by skilled and less-skilled readers to name items selected from these categor- 
ies. Although the performance of skilled and less-skilled readers on naming 
letters presented singly has been found to be equivalent (Stanovich, 1981 ; 
Wolford & Fowler, 1983), the members of the two reading groups may still dif- 
fer in the extent to which letter naming is automatized (LaBerge & Samuels, 
19V 1 !). Thus, a less-skilled reader may have to invest a large share of proc- 
essing capacity in order to name a letter as quickly as a skilled reader, who 
may have to devote relatively less attention to the task. • Such differences in 
degree of automatization might be expected to become manifest only on repeti- 
tive naming tasks, since these tasks require that suhjK^ts do more than simply 
retrieve a single name. The additional task requir grants of scanning and 
responding sequentially may be sufficient to expose reading-related differ- 
ences in the rate at which certain types of items can be named. That is, nam- 
ing letters, unlike naming , ''itured objects, is a somewhat fragile skill in 
poor beginning readers. It is easily disrupted when other factors related to 
sequential responding complicate the task. This possibility could account for 
the difference in outcome between our results on repetitive letter naming and 
those on naming single letters (Stanovich, 1981; Wolford & Fowler, 1983). 

A second question that must be entertained concerns the source of the 
differences between our results and those of other studies that might be con- 
sidered comparable. A comparison of our results with those of Blachman (1981) 
and Wolf (1981) suggests that the relationship between repetitive naming and 
reading scores varies with grade level; it is most robust at the very early 
stage3 of learning to read, after which it diminishes. Conceivably, this pat- 
tern of results is indicative of a developmental trend in which the 
ie33-skilied reader recovers from a general slowness in naming or acquires an 
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increasing level of automatization for processing certain types of items. One 
may wonder, then, whether less-skilled readers who are slightly older than 
those studied here will show equal letter naming times relative to skilled 
readers of the same age. Previous studies (Biemiller, 1977-1 97 8;Staller & 
Sekuler, 1975; Wolf, 1981) suggest thr*. they will not, but additional re- 
search is needed to clarify the matter. 

Our findings show that the less-skilled readers, compared with the 
skilled readers, were not characterized by a general slowness on repetitive 
naming. Therefore, the results indicate that less-skilled readers do not have 
a general problem in retrieving names. Rather, their word retrieval deficits 
may extend only to orthographic items. 

It should be noted that neither the present study nor earlier studies 
that exploit the repetitive .naming paradigm permit any definitive conclusion 
concerning possible deficiencies in the general ability of skilled and 
less-skilled readers to access their mental lexicons. The stimuli employed 
were limited to a small set of very common items.. Moreover, since the items 
were known to the. subjects prior to testing, it is likely that their names 
were stored in a temporary short-term buffer. Lexical memory may not have 
played an important role. Thus, the requirements of repetitive naming may 
differ from those of confrontation naming. In the latter, names must be re- 
trieved from the lexicon itself because the test items are not made known to 
the subject beforehand. Research has shown that good and poor readers-do In- 
deed sometimes differ on confrontation naming (Denckla & Rudel, 1976a; Jansky 
& deHirsch, 1973; Katz, in press; Mattis et al., 1975; Wolf, 1981). Fur- 
thermore, data obtained on a confrontation naming task (Katz, in press), sug- 
gest that poor readers are able to access the lexicon adequately; it was hy- 
pothesized that the naming problems arise because of deficiencies in process- 
ing phonological information stored at specific lexical addresses and possibly 
also because of deficiencies in the quality or completeness of the phonologi- 
cal specifications. 
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Footnotes 

l The use of combined scores ^jmed justified in light of the high corre- 
lation between the children's scores on the two Woodcock subtests, 
r(1 1) = .89, £ < .001 . 

2 All the children were also very accurate at naming the items, except for 
one boy who consistently read the word "ball" as "bell." 

'The skilled readers' mean naming times were affected by the extreme 
times of one subject. Contrary to the expectations based on previous find- 
ings, this child took several seconds longer than any other subject on the ob- 
ject, color, and animal charts. With her data eliminated, the mean times of 
the skilled reader ■■ are somewhat faster: animals, 48.9 sec; objects, 53.0; 
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colors, 41.1; letters, 25.1; words, 28.7. Nevertheless, the statistical 'ef- 
fects reported here are maintained. 

"Wolf's findings only recently came to our attention. Although the 
motivation for her study differed from ours, the results largely support our 
findings. ^ 
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The research effort over the past several years by the Haskins Laborato- 
ries reading research group has bolstered our conviction that the problem of 
most beginners who .have difficulties in acquirJng literacy in basically 
linguistic In nature. That is, in our view, the problem is not ' ? jual or au- 
ditory or motor, as many have proposed, but lies' rather in the ineffective use 
of phonologic strategies. We have found this linguistic deficiency in regard 
to two major requirements of reading prof iciency— lexical access and represen- 
tation in short-term memory. We have recently also begun to look more closely 
at spelling from the standpoint of linguistic sophistication. In this paper 
we, will describe two recent studies we have done th&t are concerned with 
linguistic abilities and spelling — one in a group of kindergarteners and the 
other in an adult literacy oLass. But, first, since the nature of the orthog- 
raphy is a central consideration in spelling, we should like to prepare the 
way by describing our assumptions about how the alphabetic orthography repre- 
sents language, assumptions that .are, in effect, the guiding principles of our 
reaearch. • 

Some Guiding Principles 

Everyone would agree, we believe, that an orthography represents a lan- 
guage, that languages are used to convey meaning, and that words are the basic 
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units by which languages do that. What is often forgotten, however, is that 
whether one receives language by eye or by ear, one must get to. the word be- 
fore one can get to its meaning and that a word exists apart from its typical- 
ly various mean'ings.. Moreover, a word has a uniquely linguistic, complex, 
phonological structure that must be somehow apprehended before one can deal 
with a message conveyed by language, whether written or spoken. In the pri-r 
mary language functions of speaking and listening, a special processor copes 
with that phonological structure in operations that normally function natural- 
ly, quite automatically, and below the level of awareness. One does not need 
to understand hov; it* w«jrks, in order to speak .and listen. In contrast, both 
the reader and writer must have some fundamental understanding of the struc- 
ture. Indeed, they must become,, to some degree, linguists of sorts, . suffi- 
ciently aware of the phonology to be able to divide the spoken language into 
the constituent segments that the orthography represents. How easy or how 
difficult .that will" be will largely depend on the*^ature of the linguistic- 
segment that the orthography represents. 

In the case of orthographies in which the segment represented is the 
word, as it is in the Chinese logography and the Japanese kanji, or the syll- 
able, as it is in the Japanese kana, 1 we contend that the begiwer's task is 
relatively 3imple. If what they need to do is separate the word or syllable 
from the speech stream for purposes of pairing it with its appropriate ortho- 
graphic unit, they will find it to be readily isolable. This has been found 
•.to be true both here and abroad in a number ©f experiments with young children 
"(Alegr'ia & Content, 1 983; Fox & v Routh, 1975'; Holden & MacGinitie, 1972). 

The problem faced by beginners in an alphabetic orthography is much more 
difficult. The essence of the problem can be put -in this way: though it is 
often said that an alphabetic orthography represents speech or ought to, it 
is, in fact,. an abstraction from speech. Although it doSs bear a fairly regu- 
lar relation J.o speech, the nature of- that relation will be hard for a child, 
or indeed any beginner, young or old, to apprehend. To understand why that is 
so, we would remark briefly, first, on why it is misleading to say that the 
alphabetic orthography represents the sounds of speech; second, why it is <n- 
so misleading to sa" that it does or should represent speech phonetically; 
and, finally, what it means to say that it is an abstraction from speech. Let 
us consider these remarks one at a time. 

The A iphnhet Does Not Represent the Sounds of Speech 

The alphabetic orthography does not transcribe the sounds of speech. In 
the first place, the letters obviously do not portray acoustic events, as they 
might if thoy were bits of oscillograms or spectrograms. So in that rather 
trivial -sense, the alphabet certainly does not represent the sounds of speech. 
However, there is a more significant sense in which the alphabet does not 
transcribe sounds. The point that need3 to be made, and which is not trivial, 
is that the segmentation of the sound does not correspond to the segmentation 
of the letters. To take a simple example--it would be impossible to divide a 
recording of the spoken word, "big," into three parts, such that when played 
bv.-k, one part would be "buh," a second part "ih," and the last part "guh." 
Tn.it is because in the spoken syllable, "big," there is only one piece of 
sound and the three phonological segments we write with the letters B, I, and 
') ire nearly simultaneously encoded into it. 



( 4 



Liberman et al.: Linguistic Abilities and Spelling 



.Encoding several segments of the phonology into one segment of sound pro- 
vides an important gain in efficiency for the listener, who, as we. ha,ve;said, 
has a built-in processor nicely equipped to deal with it automatically.; • °- "But 
it has quite adverse consequences for the beginner dealing with the" written 
language. One unfortunate consequence of the very odd ( relation betweer^o^ono- 
logical structure and sound is that the phoneme, ' .which is the/^s«men<t 
represented by the alphabetic orthography, unlike the word and the^ifible, 
is not easily separable from the speech stream. If we had not found this' to 
be so in our research with children (Liberman, Shankweiler, Figcher, & Carter, 
1974), we should have suspected it from the history of writing systems— the 
system using an alphabetic unit was the last to be developed, long after 
logographies and syllabaries. Moreover, unlike those others, the basic unit 
of the alphabetic orthography, the phoneme, was apparently discovered only 
•once and all other alphabets # were later adapted from that original, brilliant 
discovery (Diringer, 19^8). ' ' . % 

If readers and writers must be able to appreciate the relationship be- 
tween the orthographic character and the linguistic unit it represents, as we 
. believe they must, then, beginning learners of an alph^uetic system are put at 
a disadvantage initially. They will find it difficult to see the relation be- 
tween spelling and sound. And it will even be difficult for teachers to 
demonstrate that relationship to them. If teachers wish to do this with even 
a simple word like "big," they will try to isolate three sounds and in the 
process will unavoidably produce, not three phonemes, but three syllables: 
"buh," "ih," and "giih." Put together, these form a nonsense trisyllable 
"buhihguh" and , not the monosyllable "big 1 - that comprises the three phonologi- 
cal segments we spell as B-I-G (see Liberman, 1983, for a more complete 
discussion of chis point and of the two sections that follQw). 

The Alphabet Does Not Represent the Phonetic Surface of Speech 

If it is now evident that the alphabet is not a transcription of the 
sounds of speech, what about the alphabetic orthography as a phonetic 
transcription? It is, of course, possible to use an alphabet phonetically. 
Linguists do just that when they use a phonetic transcription to represent as 
precisely as possible what they perceive when they listen to speech. 
Unfortunately, the wealth of phonetic information that our natural 
speech-perceiving mechanisms can use creates serious problems when, as in 
reading and writing, we try to put all that information through the eye. 

A phonetic transcription, such as the linguist uses, preserves much sur- 
face information that is not represented in any alphabetic orthography. It 
includes all the context-conditioned variations of speech both within words 
and across syllable and word boundaries. For example, in a phonetic 
transcription, the plural s in cats would be transcribed as a but its counter- 
part in "dogs" would be z to reflect its pronunciation in that context. To * 
take another example, the final consonant in the word, "sit, 11 would be tran- 
scribed as t,, but what we ordinarily consider to be the same consonant in the 
related word, "sitter," would have to be changed from t to d to reflect 
accurately that manner change in our pronunciation. Similarly, in American 
Knglish, the contraction "what's" would be transcribed differently in the con- 
text of "What's he saying?" from its rendition in the context of "What's your 
idea?", where, because of context-cond ' \ iored effects, it would be 
coart iculated with "your" to produce "Wuhchu* (idea)?" 
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In view of these confusing context-conditioned variations, one would sup- 
pose that it should be extremely difficult to apprehend messages conveyed by 
means of a strictly phonetic transcription. And so it is, in fact. To be 
sure, any literate adult can learn to decode a phonetic transcription more 
easily than s/he can 1 decode the visual display of acoustic events in a spec- 
trogram, but even highly trained phoneticians cannot read an unfamiliar text 
written phonetically .as fluently as they would the same passage written in our 
English orthography'. The representation of context-driven articulatory 
distinctions, to say nothing about differences in linguistic .stress, emphasis, 
idiolect, and dialect, . seriously detracts from the broader requirements of 
language representation. This is certainly a case in which, except for very 
specialized purposes, more is definitely not better. 

The Alphabet Represents Phonological Structures 

* 

Given that reading the sounds of speech is hard and reading a phonetic 
transcription is only slightly less so, what is it that an alphabet should 
represent if reading is to be made as easy as possible? Presumably, in the 
ideal case, the representation should match the way words are organized in our 
heads, in what linguists refer to as our lexicons. It stands to reason that 
our lexicons milst be organized in terms of phonological or raorphophonological 2 
segments that are sufficiently abstract to stand above the many variations at 
the auditory and phonetic surfaces. We have described the difficulties we get 
into when, in trying to put language in by eye, we begin with the variable au- 
ditory and phonetic forms.. To get around these problems, we would want ideal- 
ly to have words spelled in a way that matches the abstract (morpho)phonologi- 
cal 'structures as they must be stored in the speaker's lexicon. But that is 
an ideal that is not easily achieved. 

The problem is that there are undoubtedly great differences among speak- 
ers of the language in the way in which their lexicons are organized and in 
exactly how abstractly the items are entered. To take one example, for the 
would-be reader who understands that such pairs as heal/health, steal/stealth, 
and even weal/wealth are related, the individual members of those pairs might 
lell be entered quite differently than they would be for the reader *ho has 
never noted those relationships. The entries of the former reader would in 
this instance be closer to the way English spelling deals with the language. 
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For better or worse, English spelling happens to be quite far out on the 
abstractness tension, rising considerably above the phonetic surface varia- 
tions to preserve the identities of lexical cognates. The spelling is, in 
this sense, morphophonological in nature. As such, it necessarily must strain 
the linguistic sophistication of many would-be readers and spellers. The 
young child is especially likely to lack even the tacit knowledge, what we 
have elsewhere called "phonological maturity" (Liber„:an, Liberman, Mattingly, 
h Shankweiler, 1980), that is needed to rationalize so much of the spelling. 
For example, the use of ' the same alphabetic characters for phonological seg- 
ments that are phonetically quite different, as in such pairs as muscle/muscu- 
lar and magic/magician, preserves the morphemic relations of the words and 
thus may increase fluency and efficient comprehension for a mature reader but 
would serve only as a roadblock to the ycung child who is trying to figure out 
how t he system works. 
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In summary, the point that should be emphasized here is that no matter 
how abstract it may often be and how far or how close to a given reader's 
lexicon, the alph&uetic orthography does represent the (morpho)phonological 
structure of the spoken word, not its sounds or its phonetic surface. Now we 
can consider how this characteristic e* the orthography relates to the attain- 
ment of literacy. « . 

Linguistic /bareness and Reading 

It has been our contention that in addition to the obvious*need to have 
some command of the spoken language and the ability to discriminate the graph- 
ic symbols, the first requirement for beginning readers is to acquire a cer- 
tain degree of linguistic sophistication, beyond that required for speaking 
and listening. One important aspect of linguistic sophistication is what Mat- 
tingly (1 972) of the Haskins research group has dubbed "linguistic awareness. 11 
Though it could be taken to have a more general connotation, this .term has 
been defined in a rather special way in the context of initial reading 
acquisition — to refer* to the awareness of the units of speech that are 
represented by the orthography. In the alphabetic orthography, the phoneme, 
the unit of whi:h the learner must become aware, is a constituent part of 
larger units, the word and the syllable, both of which have considerably more 
salience, as we have said. Awareness that these larger units have parts and 
the ability to identify those parts does not come easily and does not happen 
all at once. The ability to segment speech into its constituent units, of 
whatever size, has been found to show improvement from ages four to fix or se- 
ven (Calfee, Chapman, & Venezky, 1970; Liberman, 1973; Treiman & Baron, 
1981); But in this developmental sequence, awareness of the phonemic unit is 
always harder, develops later, and is generally found to be a more sensitive 
predictor of reading skills in kindergarteners and first graders than aware- 
ness of the syllable or word. There is by now a long list of studies, 
originating both here and abroad, which have been strongly support We of 
phoneme segmentation skill as a predictor of reading ability. Among the stud- 
ied that come to mind (and there are surely others as well) are Blachman 
(1983), Helfgott (1976), Mann and Liberman (1984), Zifcak (1981) from our re- 
search group, Bradley and Bryant (1983) in England, Lundberg and associates 
(1980) In Sweden, ^ox and Routh (1975) in the States, and Bertelson's labora- 
tory (Alegria & Content, 1983) in Belgium. 

fiost of the previous research on linguistic awareness- and the acquisition 
of Titeracy has been concerned with the attainment of^reading skills. Recent- 
ly, we have begun to look more closely at spelling from this vantage point. 
We should like to report on two of these investigations in this paper, one in 
which we examined the invented spellings of^tfindergarteners and the other in 
which we explored the virtually uncharted territory of linguistic factors in 
adult i 1 1 i teracy . 

Linguistic Abilities and the Invented Spellings of Kindergarteners 

The first study we will describe looked into the linguistic abilities of 
kindergarteners ir. relation to their skill in invented spelling. Before 
reporting on our findings, we should take a moment to say" what we mean by 
invented spellings. When spelling words in their spontaneous writings, 
preschoolers are, of course, limited by their meager orthographic knowledge, 
which in the beginning may include only the knowledge of names of letters 
("bet?" and ,f dee, 11 ,«for example). In his seminal work in the early seventies, 
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Charles Read (1971) demonstrated that the invented spellings of preschoolers 
display a predictable pattern in their choice of the letter? symbols used to 
represent the spoken language. Relying on their apparently quite acute 
perception of the phonetic, surfacy features of both the utterance to be 
recorded and the letter names, they know, young children begin by devising what 
amounts to a primitive phonetic transcription, rather than the phonological 
representation of our spelling system. 

To the extent that English is written abstractly, it assumes, aq we said 
earlier, a user who has, to a considerable degree, what we have called "phono- 
logical maturity" (Liberman et al., 1980). These younger children clearly do 
not have the requisite degree that our orthography demands. Given a word like 
"train," to borrow one of Read's examples, a preschooler might produce, an H as 
the first letter of the word in an attempt to represent the first phone in 
their own spoken version of the word ("chrain") by its closest counterpart in 
a letter name they know ("aitch"). As the children begin to develop more 
sophisticated apprehensions of the phonology, the purely phonetic transcrip- 
tion becomes less prevalent in their spellings. -They begin to assimilate the 
rules according to which our abstract spelling makes sense. 

This growing awareness of the (morpho)phonological rule structure of 
words is implicit in their invented spelling productions. Therefore, if one 
is interested, as we are, in evaluating the level of linguistic sophistication 
in children's spellings, it is possible to do so by constructing a scoring 
system that is fashioned to reflect that awareness, rather than being limited 
to a consideration simply of right/wrong judgments. Louisa Cook Moats (1983), 
using an analytic scoring system of this kind, did a pioneering study in which 
she found the misspellings of dyslexic fourth through eighth graders to be 
quite similar linguistically to those of nondyslexic second graders. 

We were curious to learn whether we could find possible precursors of 
linguistic deficiencies in the spelling of much -younger children— those in a 
public school- kindergp.'ten class. Accordingly, we chose to examine the rela- 
tionship between kindergarteners' proficiency in invented spelling and in 
their. other linguistic abilities. 

All the children in the kindergarten class were given a dictated, real 
word spelling test. A given word could receive a score from 0 (for simplv 
random letters) to 6 (for a correct English spelling). In this scoring sys- 
tem, we measured the children's spelling proficiency along two dimensions— the 
number of phonemes that the child included in spelling the word and also the 
level of the orthographic representation. Thus, for the target word, "sick," 
increasing scores would be awarded for the following sequences of responses: 
one phoneme with conventional letters (s, c); more than one phoneme but not 
all (sk, ck); all phonemes with phonetically related letters (sec, sek) ; all 
phonemes with conventional letters (sic, sik); all with correct spelling 
(sick). 

In addition to the spell in* te3t, e'.ght language-based tasks were admin- 
istered to the class. We fc that four of these made a difference in a 
multiple regression analysis, accounting for W of the variance in invented 
spel 1 ng proficiency; these, a phoneme segmentation task patterned after 

Elkonin (1973) accounted for 61% of the variance in invented spelling perform- 
ance and one that measured phoneme dictation accounted for an additional 20$. 
A phonomo delation task ("Say meat without the •m'»), adapted from Rosner's 
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Test of Auditory Analysis Skills (Rosner, 1975), added another 6%; and a 
measure of expressive vocabulary, the Boston Naming Teat (Kaplan, Goodglass, & 
Weintraub, 1976), 1$. 

The four other language-based tasks in our study did^not contribute sig- 
nificantly to the. variance of the invented spelling performance. They includ- 
ed a test of receptive vocabulary (the Peabody Picture Vocabulary Test, Dunn & 
Dunn, 1981); a syllable deletion task ("Say cowboy without the f cow f "), also 
adapted from Rosner 's. TAAS (1975); word repetition (correctly repeating words 
spoken by. the examiner) ; naming letters and writing letters to dictation. 
Analysis of the children's performance on word repetition, letter naming, and 
letter writing revealed only developmentally appropriate errors such as slight 
infantilisms in articulation and occasional confusions of visually similar 
letters in writing* In regard to the latter type of err$r, it is of interest 
to note that letter reversals, though present in some J protocols , were not 
found to be related to invented spelling ability. That is, children could 'be 
good Invented spellers at the kindergarten level without having fully mastered 
the correct orientation of the reversible letters. 

This study suggests that spelling skill develops systematically as young 
children master the ability to analyze words into their constituent phonemes. 
That conclusion is supported by the strength of phoneme segmentation ability, 
writing phonemes to dictation, and the deletion of phonemes as predictors of 
invented spelling ability. These three tasks all require a degree of explicit 
awareness of internal phonological word structure that is not tapped by the 
other language tasks. The other tasks all reflect certain aspects of language 
development but either do not include the analytic component at all (Peabody 
Picture Vocabulary, letter naming, letter writing, and word repetition) or tap 
it at a less abstract level, closer to the basic unit of articulation (syll- 
able deletion) • 

Linguistic Abilities and Adult Poor Spellers 

A.TiOng kindergarteners, then, we had found that the children who were the 
bettor spellers in the class exhibited better skills as well in analyzing the 
phonemic constituents of words. Recently, we examined a group of adults en- 
rolled in a community literacy class with a view toward finding out whether 
t.Vfr profiles would be similar to those of younger learners. The subjects in 

study wen- nine men whoso occupations ranged from lower-level management 
t- Svmi ~sk i 1 ;»><! .\ibor, all of whom reported serious difficulties with spel- 
ling. Five of the men had repeated -a grade in school, b;;t only one had re- 
^•ivi! earned i 1 1 -tssi stance. 

on< s <- a^ain, as we had with the kindergarteners, we administered a number 
T t*->ts of laritfuag^ ability as well as measures of spelling proficiency. To 
■ j.-r -rim th*» kinds of spoiling problems characteristic of adult Illiterates, 
w>- 'js^-j two -li^tatori li^ts of spelling words. One list was taken from the 
V'lii'W .-.ubtost of the Gal I istel-Kl 1 is Test of Coding Skills (19Y ; 0, which 
, • I';-.!**-.*, r»»a! wor-ls of both regular and irregular orthographic construction. 
Tee •n.her was .\ list of pseudowords taken from the reading subtest of the Gal- 
: ; : f l-K! ! is Test. In order 1 to provide a eompar ison with their spoiling pro- 
ficiency in spontaneous writing, we also collected writing samoles, in; i rig the 
■ ' ;' r .ui-i 4 s i-ictures of tne Test of Written l/inguage (Hammiil & Larson, 1 97 h ) . 
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Four tests of language ability were included in the adult study. As a 
check on the possibility that gross "problems in speech perception might be at 
the root of their difficulties, we examined the performance of the adults on 
the Sound Mimicry subtest of the Go^dman-Fristoe- Woodcock Auditory Skills Test 
Battery (1974). In this subtest subjects are required to repeat taped non- 
sense words, one to three syllables in length. 

In view of previous findings with children that have suggested that poo; 
'""readers and spellers may have difficulty analyzing other aspects of internal 
structure of words (Carlisle, 1984; Rubin, 1984), we wished to measure tne 
analytic abilities of the, adults at both the phonemic and morphemic levels of 
language. The test used for phonemic analysis was the Sound Analysis subtest 
, of- the Goldman-Fristoe-Woodcock Auditory Skills Test Battery (1974). Here, 
monosyllabic nonsense words are presented on tape and the subject is required 
to' identify the first, middle, or last phoneme in the word. To determine the 
subjects' ability to apply basic inflectional and derivational ,rules of 
morphology, they were also given the Berry-Talbott Test of Language (1966). 
Normal children have been found to develop* the morphological abilities tapped 
by this test in a systematic progression from the easier litems at the begin- 
ning to the more difficult ones at the end, with mastery expected by age seven 
or eight. 

Finally, in addition to testing them on spelling and language tasks, we 
also measured the oral reading ability of our adult subjects. For this pur- 
pose, both single word and passage reading measures were used. The reading 
subtest of the Gall istel-Ellis Test of Coding Skills was chosen to assess 
reading of single words. This subtest includes real word*, of two types — those 
of irregular (unpredictable by the more common orthographic rules) construc- 
tion, and those of regular construction 'that are presented in order of 
increasing difficulty by syllable type. The test also includes nonsense words 
that are arranged by syllable type as well. The Spache Diagnostic .Reading 
Scales (1972) were used to assess oral passage reading. 

We can turn now to the results of our study of adults and look first at 
their spelling performance. , On the dictated spelling of real words, they did 
somewhat better on the irregular than on the regular words— 6356 as against 57$ 
correct, reflecting, perhaps, the tendency to rely on the memory of the global 
appearance of words- often found clinically in poor readers. This possibility 
is supported by the large drop to 38$ correct in their performance on nonsense 
words often found clinically in poor readers. In order to appreciate the 
seriousness of the drop, it should be noted that wherea3 there is, of course, 
only one correct response for real words, whether of regular or irregular 
construction, there can be several acceptable spellings of each nonsense word. 
Consider the pseudoword "lete," for example. Four spellings — lete, leet, 
leat, iiet — would all be scored correct. ; 

To be sure, even with this apparent advantage for the nonsense words, one 
might still expect some discrepancy in performance between the spelling of 
nonsense and real words, in favor of the real words. However, it was evident 
from the pattern of their results that our adult subjects had not mastered the 
ba.iie phoneme-grapheme spelling patterns that would allow them consistently to 
produce even phonetically reasonable renditions of words they had not seen be- 
fore. One striking example that occurred during the reading test comes to 
mind. When presented with the written word, p_eg, one of the adult subjects 
pu:v/.l»»d over the word for some time and finally said: "Pig? Well, I know 
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it's not pig, because there's a e in the middle, but I guess I'll go with 
pig." (Letter discrimination was obviously not his problem.) It is relevant, 
to remark at this point that our adults had little difficulty in the spoken 
repetition of the auditorily presented nonsense words, performing there with 
92% success. This suggests that the problems they are having with spelling 
cannot be attributed to gross difficulties in speech perception (nor in arti- 
culation, for that matter, or some bias against using nonsense words. 

Thus far we have reported only on spelling performances on dictated sin- 
gle words. The spelling of the adults on the spontaneous writing samples was 
somewhat better than on the dictated words — 78% of the words were spelled 
correctly. But it was apparent that whatever improvement occurred here could 
probably be attributed to the tendency of the subjects to limit their produc- 
tions to words that they thought they could spell correctly. A large propor- 
tion of one subject's output even included his copying of. the wording of the 
printed test directions, for example. ^ 

An informal qualitative analysis of the errors made by the adults on the 
writing samples showed clearly that they had serious linguistic problems over 
and above their poo^ spelling. Approximately a third of the errors reflected 
grammatical weaknesses — difficulties with function words accounted for \2% of 
the errors, and omissions or substitutions of inflectional endings accounted 
for another 21 %. 

In the light of these findings, it is of interest to note that our adults 
passed only 63$ of the items on the Berry-Talbott test, which measures inflec- 
tional and derivational knowledge. In contrast, in a study recently completed 
by one of us (Rubin, 1984), a group of 60 first graders was able to pass 57% 
or* the same items. Moreover, the young children did well on the lower level 
items and less well on v the higher, thus showing a systematic development of 
morphemic understanding.. The adults, on the other hand, often performed poor- 
ly on even the simplest categories like plural and past tense inflections, 
though they were able to use them correctly in their spontaneous speech. It 
would seem that when, they have to do even a moderate degree of analysis of 
their language, whether written or spoken, their linguistic abilities are 
3triined to the point of breakdown. 

In view of their performance on the njorphemic analysis task, it was not 
surprising to find that language analysis at the phonemi.c level was especially 
trying for these adults, On our very simple phoneme analysis task, which is 
similar to those ufeed in training kindergarteners and first graders In phonem- 
segmontat i on, only 58$ of their responses were correct. Moreover, they 
found tfto'tasK frustrating and unpleasant. 

This inability of adults with literacy problems to perform well in a task 
r»-jui:Mr:K explicit understanding of the int- rnal structure of words has also 
fi found by other investigators (Byrne h Ledez, 198j; Marcel, 1980; 
Mvrtts, Cary/ Aiegria, h Bertelson, 1 979; Charles Read, personal communica- 
\:'."Vi t ? H''0. Particularly convincing in this connection is the finding by 
M>r-iis -ini associates (1979) that the performance of first graders in the 
:r:r.ui of school was slightly better in both phoneme deletion and addi- 
tion *.r>;s than that of the aoult illiterates in their study. 

Si 
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We now turn t'o a coro/arison of the reading and spelling of our adult sub- 
jects. We found that their reading of jBingle real words was better' than their 
spelling, as would be -expected- in any comparison of recognition and production 
measures. —But the pattern of performance was quite similar. "Real words were 
read with greater accuracy than nonsense words, just as o had been the case with 
spelling. But perhaps the most telling result was found in" a direct compari- 
son of reading and spelling as a funotion of word type. Whereas the reading 
of real words, as we have said, was ' generally better than the ^spelling, the 
situation was quite different, irwegard to„ the nonsense words. On nonsense 
words, for which a structural analytical "approach is obligatory rather than 
optional as it may be in dealing with real words, the performance of., the 
adults on both reading and spelling was virtually identical in quality and 
quite poor. 



In short/ the adult subjects, .like the poorest invented spellers in the 
kindergarten/study, appeared to have only the dimmest understanding of the 
phonemic structure of words. Qualitative analysis of their successes in read- 
ing suggested that memorizing words as global entities was a favored strategy. 
The analysis revealed, for example, that they often did well on words they had 
seen frequently before, even when the words were polysyllabic or irregular o in 
construction. They might, for example, read correctly a ten-letter, trisylla- 
bic word with uncommon spelling like photograph , but be at a loss to deal with 
a simple trigram like peg that they had never encountered before. They same 
kind of contrast between performance on complex praoticed words and unknown-, 
but relatively simple words was also apparent in their spelling. / . 

• 

Their oral reading of connected text., on the Spache test (1972) was clear- 
ly superior to their reading of single words, suggesting, as has been found* in 
other studies of poor readers (Perfetti & Hogaboam, 1975), that our adilt sub- 
jects were relying heavily on context to assist them in apprehending familiar 
words. At all events, reading, like spelling,, was patently a struggle for 
these men, generating once more grammatical errors not present 'in their every- 
day speech. Examination of their errors in oral passage reading revealed a 
pattern of incorrect use of inflectional morphemes and functors much like that 
noted in their spontaneous writing samples. 

Educational Implications 

In our introductory remarks, we have advanced our reasons for expecting 
that the acquisition of literacy would be related to a certain degree of 
linguistic sophistication, that is, to the ability to deal with the structure 
of language in an analytic manner. In the first study reported here we found 
that among kindergarteners, the better spellers, like the better readers among 
older children in previous investigations, have developed this ability to a 
higher degree than those who spell more poorly. In the other study, a group 
of adults in a community literacy class, who showed no serious deficiencies in 
everydn> speaking and " istening, were extremely poor spellers and at the same 
time were also deficient in various tasks requiring analytic linguistic 
skills. They found phonological ..^mentation tasks particularly ' troublesome. 



The ability to stand back from one's language and analyze its structure' 
apparently does not develop naturally as a result of cognitive maturation. It 
-lust be learned or taught. But there are several ways in which it can be 
learned or taught. Many children, as we have pointed .out elsewhere (Liberman 
% .-.hankwei ler, 1977), will develop linguistic insights as a consequence of 
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their experiences in learning to read an^ alphabetic orthography. But even for 
those children, the process might^ave been made easier and faster by giving 
them explicit instruction. For other children, including especially those who 
for whatever/ reason ar*e at-risk linguistically, more explicit instruction will 
be required if they are ever to attain true^iiteracy-- to be able,' that is, to 
deal effectively with -unknown or previously unlearned words, which is what an 
alphawtic .orthography is all about. The adults in our community literacy 
class san still profit from explicit instruction in all aspects of the struc- 
ture of language (phonological, syntactic, and morphological). But they would 
have been spared much^grief and embarrassment if their deficits had been dis- 
covered 'and addressed in kindergarten instead. 

Several investigations have now demonstrated that linguistic awareness 
can be trained (Bradley & Bryant, 1983; Olofsson & Lundberg, 1 1983; Velluti- 
no, this volume, 1983) and will make a difference in reading acquisition. 1 Re- 
cent studies make it clear that phonological (Fischer, Shankweiler, & Liber- 
man, in press) and morpholog' ; UKisle, 1984; Fischer et al., in press; 
Rubin, 1984) knowledge make a difference in spelling proficiency. At all 
events,, it now seems reasonable to suggest, in view of the present findings 
and in light of the characteristics of the alphabetic orthography and, how it 
relates to language, .nat more and earlier training in all aspects of linguis- 
tic sensitivity may promote better spelling and should be encouraged. Kinder- 
garten is not too so'<i to start. 

* 
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Footnotes * 

. * 

l The ways in which the individual characters in these orthographies rep- 
resent their respective languages are more complex than can be described here. 
It should be noted that the segments represented by « the characters, in the 
Chinese logography > and the Japanese Vanji are more correctly defined as 
morphemes rather than words; similarly! jfjie segments in the Japanese kana are 
otter described as moras than syllables!; However, it is sufficient for our 
purposes, and sufficiently accurate, to speak of the. word in the first case 
and the syllable in the second • 

2 The representation in the ideal speaker-hearer's lexicon is often 
morphophono logical, / tfhat is, the word is represented as a sequence of 
systematic phonemes divided into its constituent morphemes. For example, the 
words heal , health , healthful , have the morphophonological representations 
/hel/, /hel+e/, /hel+0*ful/, respectively (see Liberman et al,, 1980, for a 
more complete discussion of the morphophonological nature of orthographies)* 



s: 



83 



ERRORS IN SHORT-TERM MEMORY FOR GOOD AND POOR READERS 

« 

* . > 

Susan Brady, t Virginia Mann,tt and Richard Schmidtttt 

\ 

\ 

\ ' 

Abstract . Good and poor readers, in the second and third grades re- 
peated % *J-item lists of consonant-vowel (cy) syllables for recall in 
which each consonant shared 0, 1 , or 2 features with other conso- 
nants in the string. While poor readers, as in previous studies, 
performed less accurately than good readers^ the nature of their er- 
rors was the same: Both groups revealed significant effects of 
phonetic similarity and adjacency t on the incidence of errors. These 
findings suggest that poor readers employ a phonetic coding strategy 
in short-term memory, as do good readers, though less skillfully. 

Children who have difficulty learning to read have' consistently been 
found to perform less well than good readers on a wide variety of short-term 
memory (STM) tasks* (Bauer, 1977; Brady, Shankweiler, & Mann, 1983; Hogaboam 
% Perfetti, ,1973; Jornr, 1983; Katz, Healy, & Shankweiler, 1983; Liberman, 
Shankweiler, Liberman, Eowler, & Fischer, 1977; Mann, Liberman, & Shankweil- 
er, 1980; Shankweiler, Liberman, Mark, Fowler, & Fischer, 1979; Torgesen, 
1982). if, as Is generally agreed, text comprehension depends on the ability 
to preserve temporarily the phonetic form of linguistic input in STM, then a 
short-term memory deficit may be central to the comprehension and reading flu- 
ency problems of poor readers. ' : \ 

Research In the last decade suggests that the deficits poor readers dis- 
play on STM tasks are related to less efficient phonetic coding processes. 
This conclusion is supported by three lines of evidence.. First, the coritrast* 
between reading groups for performance .on STM tasks seems to be restricted to 
procedures with "phonetically recodable" stimuli. When visual stimuli are 
selected that do not lend themselves to phonetic cod .ig, the performances of 
good and ooor readers are Vie same. For example, for stimuli such as photo- 
graphs of 3trangers, nonsense" doodle drawir^s, or symbols from an unfamiliar 
writing system, recall by good and poor readers is comparable (Katz, Shank- 
weiler, & Liberman, 1981 ; Liberman, Mann, Shankweiler & Werfelman, 1982; 
Vellutino, Pruze.., Steger, & Meshoulam, 1973). Similarly, in memory for audi- 
tory stimuli such as tones- that are not readily receded phonetically, poor 
readers perform equally well on STM tasks (Holmes & Mc '^ever, 1979). The 
difference between reading groups for recall of "phonet.v • y recodable" sti 
muli and the lack of differences in performance for stimuli -hat are "phoneti- 
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fially unrecodable" highlight the poor readers* difficulty with the use of a 
phonetic code. 

> • . 

A second dine of. evidence finds that manipulations of certain phonetic 
properties of stimuli in an STM task generally have less effect on the per- 
formance of poor young readers than on.. that of the good readers (Brady et al., 
1983; Liberman et al., 1977; Mann et N al., 1980; Mark, Shankweiler, Liber- 
man, & Fowler, 1977; Olson, Davidson, Klieg'l, & Davies, 1984; ■ Shankweiler et 
al., 1977). With stimuli in which there is a low density of phonetic. conf usa- 
bility (i.e., nonrhyming items), good readers' recall is superior. ' However, 
if a high density of confusability is present, the performance of good readers 
is impaired much more than that of poor readers. It has been supposed that 
this interaction, also found for adults (Baddeley, 1966; Conrad, 197?), re- 
sults from the fact that the good reader is batte" able to form a sufficient 
phonetic code for the maintenance of information in STM. Stimuli that 
introduce confusion anc.». the phonetic representations in STM, such as strings 
of rhyming words, thereio^e tend to have a greater e'ffect on the performs ice 
of the more skilled readers. . 

Third, and of primary concern to us here, is evidence that poor readers 
may make use of a phonetic code, and- not some other coding strategy, but may 
do so less accurately or efficiently th?» do good readers (Katz, 1982). Ear- 
lier, Conrad (1971) had reported that children 6 years of age or older pro- 
duced the same pattern 'of results on STM tasks as do adults. These children 
had better recall for nonrhyming sets of pictures than for rhyming sets. In 
contrast, children younger than 6 years old did not show a difference in re- 
call for the nonrhyming set. This stu^raised the question of whether young 
children -might initially be using some other, non-phonetic, coding strategy in 
short-term memory. However, more recent research with even younger children 
,(J| yrs) has found the adult "pattern, suggestive of phonetic coding (Alegria & 
Pignot, 1979). That is to say, in the Alegria and.Pignot experiments 
' 4-year-old children recalled nonrhyming items better tnan rhyming items, lead- 
ing the authors to conclude that by 4 years of age children are already using 
a phonetic code to store and organize information in short-term memory. Thus, 
at the present, questions, must be raised as to whether there are developmental 
changes 'in .a type of code employed in short-term meniory and, in extension, 
whether p readers are indeed using a no/iphonetic strategy. 

Several findings compel us to reconsider this issue. Using *a paradigm 
that tested, recall for time periods longer than the assumed limits of STM, one 
investigator (Byrne & Shea, 1979) did obtain evidence that poor readers were 
able to u"e a phonetic code (albeit poorly) when forced to do so by pseudoworri 
stimuli, but otherwise tended to favor a semantic code (though thi3 has .ir.t 
been rep licked [Winbury, 1984]). However, 'when errors made by poor reader 
were examined* for a standard short-term ' memory task, there wa3 no indication 
*„nat poor readers were using a semantic strategy (Brady et al., 1983). Inst- 
ead, their errors, . like those of good readers, indicated that the stimuli were 
being processed phonetically. Out of U37 intrusion errors (items that were 
not in the original .list) by both reading groups, only one appeared to have 
been a 'possible semantic error ("station" for "train"); the vast' majority of 
the rema.-der- could be accounted for in terms of the phonetic units present in 
the particular string and the preceding string. What was noteworthy was that 
both reading groups appeared to be using a phonetic coding strategy, although 
the incidence of reco.r.b inat ionr (transpositions) of the phonetic information 
was significantly more frequent for the poor readers. 
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In sum, present evidence is consistent with a hypothesis that the deficit 
of poor readers on" STM tasks has its. origin in deficiencies * in creating and 
maintaining a phonetic code. the poor readers' specific difficulty with 
"phonetically recodable" stimuli, reduced sensitivity to rhyme, <and greater 
frequency of phonetio errors of transposition all support, this conclusion. 

Most of the studies of short-term memory for good and poor readers have 
assessed performance by calculating the number of items correctly reported. 
Analyzing the nature of errors, rather than just the incidence of errors, of- 
fers a way to determine how good and poor readers are actually functioning in 
STM tasks. In the last study mentioned above (Brady et al., 1983), a pre;imi* 
nary effort was made to do this and the results proved interesting. The 
occurrence of transposition errors for good and poor readers pointed- to' the 
use of a phonetic code by both reading groups raiher than revealing an alter- 
nate strategy for the poor readers. Yet the higher frequency of these errors 
for poor readers was suggestive of reading group differences in phonetic proc-. 
essing skills. However, the lists in that study had not been designed to al- 
low the rigorous analysis of errors carried out in recent studies of STM iri 
adults (cf. Drewnowski, 1980; Ellis, 1980), so the results are tentative. 

Given the importance of short-term memory for language processing and the 
evidence of STM deficits in poor reader), we thought it worthwhile to conduct 
a more analytic study of errors in STM for good readers and poor readers. 
Accordingly, two experiments were conducted in which the nature of the items 
in the memory lists was controlled. We manipulated the phonetic similarity of 
the initial' consonants as suggested by Ellis (1980). Nonsense syllables were 
selected as stimuli for two reasons: 1) The small number of soman*, .'c errors 
in a study of children's memory (Brady et al., 1983) and the compl .3 lack of,, 
such .errors by adults (Drewnowski & Murdock, 1980) 'suggests tb^t semantic 
information is not the critical dimension. This conclusion is also supported 
by the finding that recall level by good and poor readers for sentences is not 
influenced by whether the sentences are meaningful or anomalous (Mann et al., 
1980). 2) With nonsense syllables it is easier to control the distinctive 
features of the stimuli. Following Ellis's design, lists of nonsense syll- 
ables were constructed such that items in the strings shared 0, 1, or 2 fea- 
tures of the initial consonant. Th'is permitted us to determine the effects of 
the ^phonetic structure of the materials on the pattern of errors, including 
that of phonetic similarly on the incidence of transposition errors between 
adjacent items'. Anaxysis^of such effects as they relate to reading ability 
and the accuracy of recall can provide information on the use of phonetio cod- 
ing by good and poor readers. 

Experiment 1 

c 

Methods 

Subjects * In Experiment .1 f the subjects were second-grade children from 
two elementary schools in a suburban school district in Rhode inland. A 
school reading ?peciali3t, a principal, and the classroom teachers helped to 
pre-select the poorest readers and the best readers from the second-grade 
classes* In a supplementary screening procedure, the Word Attack and Word 
Recognition subtests of the Woodcock Reaair^g Mastery Tests, Form A (Woodcock, 
1973), *nd a test of receptive vocabulary, the ^eabody Pfoture Vocabulary 
Te3t-Revised (PPVT-R; Dunn 1981), were administers to the children. 
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Inclusion in the* study was determined by the following criteria: 1) To 
insure valid classification as. a good or poor reader, the scores on the two 
Woodcock subtests had to be consistent. * 2) In order to restrict the range of 
IQ scores, only subtests with scores from 90 to 135 on the PPVT were eligible 
for further testing. 3) Given the evidence that STM span increases with age 
(of. Dempster, 1981), subjects were selected whose ages fell within the limit- 
ed range of 88' to 100 months. : 

Twenty-eight children satisfied the requirements for participation in the 
study. Based on the scores that were obtained on the reading tests, two 
groups were formed that were non-overlapping in reading level. The H* chil- 
dren who qualified as good readers were well beyond the end of second-grade 
reading performance (testing was done in the spring) with a mean reading grade 
level of 5.0. TheMJI children labeled poor readers had an average reading 
grade level of 2.1, and lagged considerably behind their peers. The IQ 
scores, as determined by tHe PPVT, did not differ significantly. The mean.IQ 
score for the good readers was II 1 *, for the poor readers 111.' The reading 
groups also did not significantly differ in age. The good readers had a mean 
age of 96.1 months, the poor readers had a mean age of 96.1 months., & 

Materials and procedure . The materials * uprise lists of four nonsense 
syllables presented' auditorily in three practice trials and twelve test se- 
quences. In all sequences the stimuli were the consonant-vowel (CV) syllables 
/Sa/, /2s/, /Ga/, and /Ka/. In these syllables, the vowel is held consta, t, 
and the initial consonants share 0, 1, or 2 phonetic features. The four syll- 
ables can be combined into 6 possible p?irs, two of which share 0 features, 
two which 'share 1 feature, and two w share 2 features (as detailed in 
Table 1). The trials consisted of rai unizations of these four syllables in 
which each consonant occurred only oi.oe per list and three times at each of 
the serialCpositions 1 to 4. For the six stimulus pairs, each occurred twice 
(once in rich order, e.g., /So/, /Za/, and /Za/, /Sa/) at each of the serial 
positions f and 2, 2 and 3, and 3 and 1. 



Table 1 

Experiment i: The consonant pairs describe' in terms of 
shared distinctive features 

• Distinctive Feature < 
Consonant Number of 

Pairs Voicing Place Consonant Shared Feature s 

SZ ♦ + 2 



GK 

SK + 

; ZG + 



+ 



+ 2 

1 
1 



sc - or 



ZK 



0 



The practice trials and test sequences were read with a neutral intona- 
tion by a phonetically-trained male speaker and recorded on magnetic, tape. 
The materials were presented to subjects through headphones. Within each list 



Brady et al.t Errors in Short-rerm Memory for Good and Poor Readers 



the syllables were spoken with a neutral prosody <at the rate of one per sec- 
ond, t. 

Each child was tested individually for two sessions in a small room pro- 
vided by the school. The first session consisted of the screening procedure, 
the second the memory task. In the second session the practice trials were 
presented (repeated if necessary), followed by the test sequences. For each 
trial, subjects were instructed to repeat the items in the order they had been 
presented, as soon as the list ended. 

Results and Discussion « 

In a preliminary analysis of the data, the number of correct responses 
was tabulated in terms of item or^er and serial position, as is customarily 
done in studies of the short-term recall in good and poor readers. The- 
innovation was to further analyze the errors qualitatively in relation to the 
syllables that were' adjacent to the. target syllable in the test sequence, 1 and 
in relation to the phonetic features of . the target syllable. 

Analysis of correct responses . Since the same items were presented on 
each trial, yarying Only in terms of order, an order-correct scoring procedure 
was adopted. A response was considered correct only if it had been assigned 
to the appropriate serial position. Figure 1 shows the mean number correct at 
each serial position for each group of subjects. Consistent with earlier 
studies the good readers were notably more accurate -overall than the poor 
readers, F(1,26) - 9.99, £ - .004. Both groups showed' a significant effect of 
serial position, F(3,78) - 36.^6, £ < .001. ~' 




— 1 1 r- t r 

1 2 3 4 

Serial Position 



igure 1. Experiment 1; The mean number of items correctly reported plotted 
by serial position. 
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Analyses of covariance using IQ and age as the covariates were conducted 
'to evaluate whether the obtained differences in ,p< ~formance might be attribut- 
ed to differences in age or intelligence between good and poor readers in our 
sample. Neither analysis altered the pattern of results: 1 with IQ as the 
oovariate the good readers were still superic/ in recall, F(1,25) • 8.72, £ - 
•.007; likewise with age as the covariate, F(1,25) - 9.47, £- .005. 

Thus, it was clear that the characteristic differences in STM recall be- 
tween good and poor readers were obtained with the present task. Having 
replicated this pattern, we turned our attention to the nature of the errors 
for both reading groups. 

Error analysis . The errors were first categorized as either misplace- 
ments of phonetic information that had been in the 9tring, or as errors of 
omipsion (no response) or substitution errors (phonemes that were not in the 
original list). . For both good and poor readers, the majority of errors were 
the first category, phonemes that had occurred elsewhere in the string, 
F(1,26) - 56.01 ,'g < .001. Since the vowel was the same for all items, it is 
not possible to determine whether these order errors are phoneme transposi- 
tions or entir%3yllable order errors. 

However, the construction of the present experiment, using phonemes that 
t differ systematically in shared features, does allow us to examine the condi- 
tions under which these order errors occurred. Two parameters were measured. 
First, an error was evaluated as having been, in the original string, adjacent 
to the target or nonadjacent (e.g., target string "Gs, Ze, Ke, Se," reported 
string "Ga, Sa,,Ka, Za": original location of misreported items was nonadja- 
cent to error" position) . Second, an error was scored in terms of the number 
of features shared between the substitute response and the target item (e.g., 
error /G/, target /K/: two features in commc ). 

> 

As shown in Figure 2, the source (i.e., original location) of substituted 
stimuli was a significant factor?* both for good and poor readers. Errors were 
significantly more likely to involve a syllable adjacent to ,e target than a 
nonadjacent syllable, F(1,26) - 11.14, £- .002. This suggests that the sub- 
jects had retained some information about the relative position of Items in 
the original ctri'ng even when they were unable to make a fully accurate re- 
port. 

The second qualitative scoring procedure, which evaluated the phonetic 
similarity between errors and target stimuli, is more central to our interest 
In the coding skills of good and poor readers. Would the incidence of order 
errors be greater for stimuli that shared two features than for those with one 
or none in common? A significant effect of phonetic similarity was obtained, 
F(2,52) - 7.93, £ < .001, underscoring the phonetic basis for storage In STM. 
Yet the effect did not differ for good and poor readers, suggesting that both 
reading groups were relying on phonetic representation. 

In -Figure 3, the effects of feature similarity -on the occurrence of er* 
-.rors is plotted for the order errors involving both adjacent and nonadjacent 
target stimuli. Here it can be seen that there Is* tendency for nonadjacent 
errors to be more influenced by phonetic .similarity than for adjacent errors 
(adjacncy x similarity: F(2,52) - 2.72, p_ - .075). Another way of viewing 
this tendency is in terms of the distinction between memory for item identity 
and memory for order (Healy, 1975). Ijo this analysis, adjacent er . ;rs are 
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Good 



Nonadjacent 



Adjacent 



Figure 2. Experiment *1 : The mean number of times the error consisted of an 
- item from a nonadjacent or an adjacent position. 
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Figure 3. Experiment 1: Analysis of feature similarity effects for adjacent 
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more likely to be strictly order confusions, while more distant mistakes, 
which are subject to effects of phonetic similarity, are more likely to be 
"item" confusions reflecting poor retention of information. This pattern was 
obtained equally for good and poor readers as indicated by the lack of an 
interaction between reading group, similarity, and adjacency (£ > .5). 

- In sum, in this experiment poor readers were found to recall significant- 
ly less information than did the good readers. However, like good readers, 
their errors consisted largely of information that had been in the string, 
reported"in the wrong order, rather than errors of omissions or substitutions. 
Further analyses showed that these order errors revealed significant effects 
of adjacency and of phonetic similarity. Thus the. errors of poor readers^show 
the same systematic- effects of processing as do the errors of good readers, 
but occur at a higher rate. The implication, of this would seem to be that 
poor readers employ the same coding strategy as do good readers but less 
effectively. 

Experiment 2 

To determine the generality of the results to other consonants and other 
classes of phonemes, good and poor readers were tested on a second set of 
items consisting of syllables that, started with the consonants 
/Ba/, and /Ka/. This task was designed to allow a further investigation of 
the phonetic factors In STM. In 'addition to a same-vowel condition, a 
mixed-vowel set was employed to further explore the nature of order errors, 
in this condition, it is possible to analyze whether . order °/ f 
transpositions of phonetic segments (e.g., /Mi/, /Use/ for /Ni/, /Mae/) or or 
lynllll misorderings (e.g., /Mae/, /Nx/ for /Mi/. /Mae/. In his way we 
hoped to make a more fine-grained analysis of the processing strategies of 
good and pooi? readers. 

Subjects 

The same subjects were recruited for participation in Experiment 2, 
conducted, in the spring of the following school year (3rd grade). Four chil- 
dren were no longer available, two good readers and two poor readers. The re- 
maining children were reevaluated for inclus.on in the study, in accord with 
the criteria outlined for Experiment 1.» For all subjects, placement in a 
reading group was the same as it had been the previous year. Additional chil 
dren were screened to increase the number of subjects in each group. Two 
children qualified as good readers and four as poor readers bringing the group 
sizes to H good readers and 16 poor readers. 

As before, the reading groups we-e non-overlapping in reading level. The 
Hi good readers had a mean reading grade level of 8.7. The 16 children who 
w»ro labeled poor readers had a mean reading grade level of 2.9. .he PP v iq 
soor.-3 did not differ significantly for the reading groups: good readers?, x 
= 1H; poor readers, x = 111. Nor did the ages significantly differ: good 
readers, x - 107.2 mos.; poor readers, x = 108.3. 

Materials and Procedure 

Exp*riPiPnt 2 was designed to' be exactly parallel to Experiment 1 with 
different stimulus sets. A trial again consisted of four nonsense syllables 
-r.-jpnt-l auditorily, and the subjects were asked to repeat the list in the 
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order of presentation. There were now ! two conditions, a same- vowel -set and a 
mixed-vowel set. The construction of the test sequences was identical to 
Experiment 1 . The initial consonants of the test items in each set again had 
0, 1, or 2 phonological features in common with the other stimuli selected 
(see Table 2). In the same-vowel condition the stimuli were /Ms/, /Na/ t /Bo/, 
and /Kb/, in the non-rhyme set, the consonants were randomly paired with the 
vowels /i/, /e/, /o/, /o/, /a/, /ax/, /oi/, /ei/, /ae/, and /u/ (with the 
stipulation that CV combinations sounding like real words were excluded). 



Table 2 

Experiment 2: The consonant pairs described in terms of shared distinctive 

features 

Distinctive Feature 

Consonant Number of 

Pairs Voicing Place Consonant » Shared Features 

MN + - + 2 

MB + + " ~ 2 

NB • + - 1 

BK - - ♦ 1 

MK - - - .0 

NK - 0 



The preparation of stimulus tapes and the method of testing "'lso mirrored 
the procedures adopted in Experiment 1 , except that each child was tested for 
three sessions (one screening session and two memory-task sessions). each 
memory-task session a subject would have three practice trials (repeated if 
necessary) and six test trials for one condition (e.g., same vowel), followed 
by three practice trials (or more) and six test trials for the other condition 
(e.g., mixed vowel). The order of test conditions was reversed for the second 
session. Within each reading group, half of the subjects began with the same 
vowel condition, half with the mixed vowel set. 0 

Results and Discussion „ 

The scoring methods used in Experiment 1 were employed. The results will 
be presented jointly for the two vowel conditions. First, the correct re- 
sponse data\vill be discussed, followed by the error analysis results. 

Analysis of correct r espon ses. In Figure 4 the mean* number correct, at. 
each ser u7~ position la plotted for the two vowel sets, same vowel and mixed 
vowel.. In these plots, the data have been analyzed only for consonant infor- 
mation. Thus, in this analysis the vowel responses were not scored. Except 
for reoail being a little better for all subjects, as expected for older chil- 
dren, the results replicate those in Experiment K Good -readers were superior 
to poor readers in recall for both the same vowel set, F(1,28) « 1-J.P5, p < 
.001, and the mixed vowel set, F(1,.?8) = b. 6*4, p = .0^. In . id' it ion, the 
vrUl position effect was significant for both vowel conditions: same vowel, 
KH ,«•*») = 19. V, P < .001; mixed vowel, F(3,26) - lb. 03, p < .001. 
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Figure 4. Experiment 2: The mean number of consonants correctly reported 
plotted by serial position. 
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Interestingly, recall of consonants appears to be independent of the 
vowel environment. There was a striking lack of difference in error rate for 
consonants for the 'two vowel sets, F (1 vf 38 ) * .00, £ > .99,^as shown in Figure 
5. That this held for both reading grqups is supported by the lack of a group 
x test interaction, F(1 ,28) = 2, 2 <3i p 3 .13* 

<y» • 

In most memory studies, the consonants are not scored in isolation, but 
rather the entire response is scored as correct or not. We will now present 
the data in this fashion, counting a response as correct only if both the con- 
sonant and vowel were accurately reported. Scoring the entire syllable, a 
substantial difference in accuracy emerges for the memory sets, F ( 1 , 28 ) » 
1^6.^9, £ < .001. This can be seen in Figure 6. Of course, the ^rror rate 
for the same-vowel set is the same as plotted for the consonant scoring, since 
subjects do not make mistakes on the vowels in this task. This is not so for 
the mixed vowel set where the amount of information to be recalled is much 
greater* Subjects must retain both the consonant and the vowel, and with non- 
sense syllables this cannot be facilitated by se/rfantic information. The in- 
creased memory load is reflected in the lower accuracy for the mixed vowel 
condition. 
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Figure 6. Kxperiment .-' : The moan number of correct responses (total syll- 
able) reports] jotted for <m*- v i roaming ^roup. 



Good readers show a greater improvement on the easier memory task (samp 
vowel 3et) than do tne poor readers, F(1,28) « 8.59, p * .007. The particular 
pattern of errors by ^ood and poor reader;, for vowels and consonants will be 
discussed below in the error analysis section, further, the results of the 
same vowel and mix^d vowel sots in relation to otner studies investigating the 
effects of rhyne on recall will be addrer33ed in the conclusion. 

In 3urn, tn^ re:iults of trie correct response analyses show that good read- 
pr>» have performed significantly better on both conditions (same and 
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mixed-vowel) and* for bqEh scoring techniques (consonant alone and .whole syll- 
able). "< . • 



- Error analysis . Turning to analyses of the types of errors, we found, as 
in v Experiment 1,* that, the majority of errors consisted of misorderings of the 
items in the v string, rather than ' substitutions of new items or omissions. 
This effect was significant both for the 'same , vowej- -condition, F(1 ,28) - 
'135.1 J; 2 < » 001 » and- the mixed vowel condition, £(1 ,28) * 1 96*. 08 , j>* ,< .001,' . 

t HoweVcr, in the mixed vow^aet, in which we. can examine vowel errors, 
the errors for consonants and vowels differed somewhat. As noted, the .reading 
groups differed significantly en the consonant errors, .£(1,28) * 5.6*1, £ - 
.025, but the difference was- not obtained for 'vowel errors, F(.1,28) * 2.51, £ 
» »12*l. Although both groups produced > many errors on the vowels, error rate 
did rjot distinguish the groups. r 

Table 3 displays the ways in which the consonants And vowels vary as to 
.•error type. First, as -stated earlier, it can be seen that vf or the consonants, 
very few errors consisted of substitutions or omissions,. 'With such a limited 
data set, this is as would be expected. For the vowels a larger set of stimu- 



li "wad possible 
errors occurred. 



in any particular string, and a fair number of- substitution 
ci i wi o wyuui i eu. Second, few of the errors for either reading group '•consisted' 
of entire syllable, misorderings, and 'no group difference was observed for tlJis 
frror type, £(1,28) ««1.90, £ - '.179. Third, the majority of errors* consist 
of transpositions of the available consonants and vowels, ' which create ne,w 
syllables. The significant difference between good ,and poor readers arises 
from the greater frequency of transposition errdrs for the pobr" readers. !; 

\ .... 




• Table 3 



Reading 
G^oup 



Good 
Poor 



Noflrhyme condition, Analysis of Errors 
Consonant Errprs 



x Total 

Errors for 

Consonants 




16.93 
22.00 



x Errors: 

Whole 

Syllable 

in Wrong 

Position 
* 1 

2.07 
3.M 



x Errors: 
Transposition 
(Consonant . 
Wrong Posi- 
tion .With 
New Vowel ) 

13.07 ■ 
17.06 



x Errors: 

Omission 

and 

Substitution' 

*• . - 

, 1 .79 
1 .50 



tr • 

\ 



Heading 
Group 



x Total 
Errors for 
Vowels 



Vowel Errors 



x Errors 
Whole 
Syllable 
in Wrong 
Position 



x Errors: 
Transposition 
( Vowe 1 
Wrong Posi- 
tion With New 
Consonant ) 



.% Errors: 
Omissi'on 
and 

Subrtitution 



Good 
Poor 



15.36 
•1-9.13 



2.07 



6.00 
8.50 



7.29 
7.19 



Brady et 41. i Errors in S'hort-.Term Memqry for Good, and- Poor Readers 

» * \ ' a » , , • • m 

\ . . 

Let us next- look/ as in Experiment. 1 , at the effects of adjacency and 
phonological similarity- on the occurrende of consonant errors? It is olear 
from Figure 7 that there is. a pronounced effect of adjacency on errors in both 
vowel conditions. For both, good and'.poor readers, ^-transpositions more often 
involved consonants from adjacent syllables than/ from nonadJ[acent stimuli 
(same vowel set, -£(1,28). 104.72,- £ < .001; mixed vowel set', F(1 ,28) - 
11.71, 2 < %001). Further,? there -were also significant effects'' >of phonetic 
feature similarity, oh v the incidence of transposition errors (same vowel set, 

-F(2,27) = 6.61, £ - .005; mixed vowel set, F(2,27) = 5.^3, £ - .011); Order 
errors were thus more Mkely to eccu,r between- stimuli that shared phonetic 
information.' In this-experiment, in contrast to the first experiment, the ef- 
fects of phonetic ^similarity were evident both for the adjacent errors and for 

.the nonadjacent errors. Therefore, in Figure 8, the data for similarity ef- 
fects are combined for these two error types. The reader will recall 'that* the 
effects of similarity had been more pronounced in; the .first experiment for the 
nonadjaoent errors. It is not clear whether this' difference between Experi-* 
ment 1 and Experiment . 2 . may indicate a developmental increase in sensitivity 
to phonological factqrs relative to adjacency effects, or whether "it may be 
related to the particular stimuli used in .these rnemory tasks; * , • * 

-In any case, to summarize, although the poor readers made more errors 
than good>eaders,; the majority of errors for both. consisted of reorderings of 
the phonetic information in the string. * These recombinations showed strong 
influences of adjacency and phonetic similarity reflecting, we pr.esume, "the 
underlying processing strategies. The kinds of errors suggest that the-infe-. 
rior performance of poor readers on these short-term memory tasks is 'not the 
consequence of a different coding strategy, but rather of' a lesser degree of 
skill H^th a phonetic strategy. . ' * 

Conclusion 



A 



Our goal was to. conduct studies that would allow us to determine the cod- 
ing processes of good readers and poor readers on short-term memory tasks. 
Previous work (Brady et al., 1983) had provided preliminary evidence that poor 
readers, like good readers, use a phonetic code in" STM. In the present 
experiments Jiis question v was cjirectly evaluated using nonsense strings in 
which the phonetic similarity of 'the items was controlled. 

Good and poor readers from the second grade (Experiment 1) and from- the. 
third grade (Experiment 2) ^were tested in experiments with differing sets of 
consonant/vowel syllables. The results of the two experiments were congruent: 
The ma4ority of consonant .errors consisted of transpositions of one item from 
the, sequence for another, with significant effects of phonetic similarity. 
This pattern held for'both good readers. and poor readers. The two groups dif- 
fered, however, in the occurrence of consonant transposition errors, which 
were significantly more frequent for the children with reading problems. Both 
reading groups also showed significant effects of adjacency: order. confusions 
for all stimuli were more likely to occur .between adjacent items than between, 
nonadjacent items. This pattern was produced by good readers and poor readers 
and probably reflects the demands of a serial order task. 

< 

In the second experiment a mixed-vowel condition was add^d that allowed 
an examination of vowel - errors. While both good and poor readers produced a 
fair number of errors for vowels-, the error rate did * -ot distinguish the read- 
ing groups* ' * * 
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• A comment is in order about the relative difficulty of the same vowe^ 
(rhyming) and mixed vowel (norirhyming) sets. Our .'results are • seemingly at 

. odds with what is generally, found. As mentioned in the introduction, adult 
sub)ects and good ^readers' usualjty have better recall for- non-rhyming items 
than for lists, of "rhyming stimuli (e.g, Baddeley, 1966} Shankweiler et.'ai., 
1979). The greater confusion in recall when the items are phonetically simi- 
lar, has. been taken* as reflecting the codlig processes irt STM. How*ever, these 
findings' have b,een obtained for longer sequences than, were used in the present 
experiment. The effects of rhyme jnay well -depend , on how taxed* the system is 
ta similar idea has been, expressed by Hall, Wilson, Humphreys', Tinzmann, & 
Bowyer, 1983). The everyday experience of rhyme facilitating, reca*^ may be 
related to this. Ih »the present task, sequences were deliberately kept short 

.so as to optimize the subject's ability to recall the correct number of stimu- 
li. We thought this would "facilitate .the* examination of order errors and 
transposition errors in the reported items. For longer strings subjects 
increasingly fail to recall the entire sequence. In^ttyat dase, partial report 
by the jsubject (e.g., giving' 4 out of 7 stimuli) permits a less Structured 
analysis of errors. 

That rhyme effects are tied to task faetprs has also* been suggested by 
results obtained for adults (see Watkins, Watkins, & Crowderi J 97-4 ) • In 'a 
paradigm similar to the present^ one using short strings o f f nonsense, syllables, 
Ellis (1980) did not find ' sign if fcant differences in v the error rates for an 
all-same vowel condition and for an all-different vowel Condition (though with 
other conditions there was a v significant effeqt'of vowel environment on error 
rate). . Thus, the particular type of STM task used in the present experiment 
has not been found to produce the standard : ,v, yme effect with normal adults. 

, Analyses of previous STM studips with* children varying in reading ability 
sh<3wa that the levels of recall on STM tasks have consistently distinguished 
reading groups. However, the effects of rhyme have proved to' be somewhat 
labile,. As for <aduits, task factors appear to # influ,ence the relative diffi* 
culty of rhyming strings. Hanson, Liberman, and Shankweiler (1984) also* found 
repeated rhyming strings to be easier for subjects. In v their task they also 
used short sequences (4 items) with the same stimuli repeated on each trial in 
varying prder, In addition, the "rhyme effect 11 appears to" be sensitive to 
subject characteristics (Hall et al., 1983) and ta age effects (Olson, David- 
son, KlLegl, & Davies, 1984). It is evident that additional work is necessary 
to understand the basis of the traditional rhyme effect ir^ STM. -0 ' 

In the present study we are asking a different question: .can we f<Lnd 
out whether good and poor readers employ different strategies in STM?« What is 
critical is not so much the. direction of the effect of rhyme buV whether poor 
Readers are susceptible to effects of phonetic similarity'.. To summarize our 
findings on that, question, error analyses < revealed that both good and poor 
readers* show effects of phonetic similarity from which we' infer that both 
groups use a phonetic coding strategy. The inferior performance of .th^ poor 
readers arose from a higher incidence it of errors involving transpositions of 
consonants sharing phonetic features. The shared features can be 'presumed to 
place demands on phonetic coding skills, thus these results suggest that the 
STM deficit associated with poor reader^ is related in some degree to greater 
difficulty in establishing or maintaining a phonetic code. 
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The factors of "phonetic similarity and adjacency have - been noted as im- 
portant, aspects of *STM processing for adults as well. Several authors have 
<► noted that errors in recall by adults show effects' of feature similarity 
(Cole, Haber, & Sales, ' 1968; Cole, Sales*, & Haber, 1969j • Hintzman, 1967.; 
rWickelgren, 1966). Further, Ellis (1980) specifically documented effects of' 
' phonetic parameters on the occurrence of 'transposition* errors* by "adults .on 
tasks simi lap to those in the present experiments. ' Hitch .(197*0 and Ryan 
(1969) also reported a* strong effect Qf adjacency in. the recall errors or 

* . - adults.. v • * • ■ : 

» . *• ,* 

<• • , , . * 

thus the same processing strategies appear jto be. at work for adults, 
children who are good readers, arid children ' whe are poor* readers. Yet a 
difference exists- belTween groups • in recall level, with adul x ts performing bet- 
. ter than children, and good readers recalling more than poo^ readers, *te are 
suggesting that the STM deficit of poor readers is related to less efficient 
processing of phonetic information by- those children, perhaps reflecting a 
maturational lag, .(Mann & Libermari,' 1984; Satz & Sparrow, ^970). In a 
'developmental study, Olson et al. ,(19841) reported the same* improvements in re- 
call and sensitivity to phonetic 1 factors in* poor headers as in^goo.d readers, 
but at an older age. ..,<,' '" .•».* - * <■ 

'"N Case', Kurland, and Goldberg (1 982) have offered an account of why younger" 
children,- in. general perform less well that may also' apply to poor reader's. 
These authors argue that as a 'child gets older, basic encoding r and retrieval 
operations in STM -become more- efficients-resulting in more functional storage • 
space \(ahd irr a 'concomitant increase-in short-term memory capacity).. In sup- 
, * • port of this interpretation, 'thfery report a^ linear relationship between in-. 
• '. ' creases in memory, span and increases in speed* of word repetition for norm'al » 
children 3 to' *6 "years old. Their position Is .buttressed by an add4tlen£l^- 
y experiment in which, adults "were forced to count in an^'unfamiliar language**"* 
y Th« '.speed of 'counting for 't'he adults was, now, equal to the rate of ' 
» . six-year-olds, ~and\teheir memory span correspondingly dropped. ' ' t . 

' .• ~ ,it ,may be worth tooting that individual- differences in memory span are 
found throughout the lifespan. Furthermore, , there . are some indications that 
, r phonological skills' also vary for adults and that these two findings 'may be 
related, For example* Baddeley, 'Thomson, ' and Buchanan, (1975). found thatj . 
adults 1 ' memory a1>an could be predicted on the basis of the number lof words 
that the- 3ubje$t\could*read in 1 approximately 2 seconds. 

\ ' • * 

• * To summarize, the present work indicated -that children • who .are- poor read- 
•> era are using, the same phonetic processes in STM as are good readers or • , 
adults, but. less efficiently. We would like' to" .explore (further the rela^ion- 
i ' . ship of .phonetic, processing skills to shor.t-term memory to' determine whether 
* that relationship can account for the developmental changes in short-term mem- 
« ory that 1 " have been observed^ and ^or the -'memory * differences '.for * children- - t 

V ,d if feeing in reading- ability. % / ' * , r \ 
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Footnotes " 

'< *Ixi the ' remainder of the analyses Tjr periments 1 and 2, the data were 
likewise reanalyzed controlling for age and IQ*. In no case was the signifi- 
cance t of the differences between reading group.s reduced wfien age ano>JQ wer*e 
controlled/ • 

;The criteria for inclusion were -adjusted for children a year' older- 
Subjects, were selected whose ages fell within th'e range of 100 to 112 months. 



\ - 



V 



\ 



A 



V 



103 



LONGITUDINAL PREDICTION AND PREVENTION OF EARLY READING DIFFICULTY* 
Virginia A. Mannt. * • 



Abstract. The results of manyL studies suggest that early reading 
' problems are associated with deficiencies in certain spoken language 
skills. Children who encounter reading difficulty tend to be' less 
. able than matched good readers, to perceive 'spoken Words under noisy 
conditions, less able to retain linguistic material in temporary 
memory, less able to comprehend certain spoken sentences accurately, 
and less fully aware of the phonological structure of spoken words. 
This paper summarizes these findings, and places them in the context 
of the requirements, of skUled reading. The results of two 
» longitudinal studies are reviewed, which show that inferior perform- 
ance in kindergarten tests ^or language skills may presage future" 
reading problems in the first-grade J Based on these studies, proce- 
dures are su$g«sted ffcr kindergarten screening and for some ways of 
aiding childr who/ by virtue, of inferior performance on the 
screening tests, might' be considered' at risk for early reading 
difficultiesi ,■ 

/ 5 
/ « 

The focus. of this paper is the prediction and prevention of a specific, 
and ; quite prevalent, form of learning ' disability: early reading difficulty. 
The contention, which wW.1 be evidence from a, variety of experiments, is that 
deficiencies in certain spoken language skills often limits the attainment of 
beginning reading skills. From -the assumption that skilled reading involves 
decoding a written, representatipn of one's spoken^ language, it follows that 
linguistic skills should be among the critical prerequisites for successfully 
learning to read. The 1 view that reading skill is derived from primary lan- 
guage skill Is e^dent from a consideration of what skilled reading is all 
about. It is also supported by the findings of experimental investigations of 
factors related to reading ability in adults. It provides the theoretical 
perspective that has guided investigations by the reading research group at 
Haakins Laboratories. Finally, it hassled to the findings presented below 
that certain language deficiencies in kindergarteners are prognostic, of. early 
reading problems. Tj . 

i\ - > 



♦ Annals of Dyslexia , 198^, 34, 1 1'7-1 36. 

tAlso Bryn Mawr College. • ' * 

Acknowledgment . This paper was presented at the 34th Annual Conference 'of 
the Orton Dyslexia Society, November, T983, San Diego, California, li was 
prepared while the author was a Fulbright Fellow at .the Research Institute of 
Logopedics and Phoniatrics at the University of Tokyo, Tokyo, Japan. Much pf 
the research herein described' was supported by -NlCHD Grant HD-01$94 and BRS 
Grant 05596 to Haskins Laboratories. I would like to acknowledge the helpful 
cooperation of the children and teachers at £he Bolles School in Jackson- 
ville, FL, and to express my gratitude to Donald- Shankweiler for his comments 
on an earlier draft, as well as to Isabella Ltberman for insitfitful conversaV 
tions. * ■ ' ' . f 



[HASKINS LABORATORIES: Status Report • on" Speech Research SR-31 (1985)] 

105..' 



105 



. , Mann: Prediction and Prevention ■ . 

••'>■* • 1 '• 

In particular, two linguistic factors, are consistently associated with . 
reading ability in beginning readers. (see Mann & Liberman, 19.84, or Mann » # in 
press). These Include ohildreh's degree of sophistication about the phoriolog- 
ical structure of language, and their ability to process spoken language ful- 
ly. I will now turn to the task of elucidating and discussing some of 'the re- 
search that concerns each of these factors, to illustrate how it Informs our 
understanding of" early reading difficulty, its 'prediction and prevention. 

" -w ' 

•Phonological Sophistication and Ph onetic Processing : • 

Factors in Skilled Reading 1 

Like all orthographies, the English alphabet functions as a symbol system 
that transcribes certain units of the spoken language, and like "all 
orthographies it appeals to the reader's intuitive appreciation of some aspect 
of linguistic structure (Hung & Tzeng, 1981; Liberman, Liberman, Mattingly, & 
Shankweiler, 1980).. In actuality, English' orthography, is a morphophonological 
transcription that represents "the word as a sequence of systematic phonemes, 
while, at the same time, capturing its constituent morphemes and underlying 
phonology. * It therefore maps onto a deep, abstract level' of language that 
corresponds rSther closely to the way generative pnonologists assume that 
words are represented in the 'ideal ' speaker-hearer's mental dictionary, or 
lexicon (Chomsky, 1964). it this characterization is *accurate, the most ef-: 
fective strategy for the rleader of English would be to recover the abstract 
lexical representation 0 that a given ' string of letters "stands for, ,; and with 
it, the word's semantic and syntactic extensions. Readers way also recover 
the phonetic representations of words by applying the phonological rules of 
English to morphophonological representations — rules that otherwise relate 
phonetic representations to morphophonological ones. 

* 

There is an advantage and a disadvantage to the way' the alphabet repre- 
sents English, and both of these* follow from the nature of the relationship 
•between letter sequences and spoken words*?' The. advantage is that knowledge, of 
this relationship between printed and spoken langdage allows the reader to 
decode not pnly highly familiar words, but also less familiar ones,' and' even 
words that ha.ve never been seen nor heard "before. Whereas a skilled reader of 
a logograpKy must have memorized at least two thousand distinct characters in 
order to read a newspaper, a skilled reader of .English need only know-* limit- 
ed set of phoneme to grapheme correspondences and the phonological and morpho- 
logical rules of English. But the disadvantage of the alphabet is that it 
requires' phonological 30phistication--a relatively fine-grained -level of 
intuitive- 'appreciation' about the .phonological structure of spoken language. 
To take full advantage of the alphabet, would-be-readers m u st somehow access 
their tacit knowledge of phonemes, morphemes, and phonological rules and apply 
that "knowledge in an Explicit, artificial fashicfh not required for spoken lan- 
guage (Mattingly, 1972). Such extensive phonological sophistication need not 
be achieved -by the readers of a logography, for example, who need only know 
that their spoken ^anguage ponsists of words. Readers of the alphabet, howev- 
er, must not only know about words, but also about the internal structure of 
words; that is, they must know' about syllables and phonemes, and about -the 
complex phonological, rules that relate the phonetic units we produce and per- 
ceive to .the abstract morphophonological representations that the letters of 
words "stand for." Otherwise, they cannot realize the virtues of the alpha- 
bet. ' ^ 

. • • - ioo ) 
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Theoretical considerations, then* reveal the 5 relevance of phonological 
sophistication to effective use of the English alphabet. .The -further impor- ' 
tance of another set of linguistic 'factors, spoken language processing skills, 
is illustrated by/ some . experimental evidence about the skilled - reading of 
words, sentencesY and paragraphs. The question of. whether speech recoding 
mediates^ J^x-toal ' access ' from print has occupied much of the research on ' 
skilledf reading (see, Crowder, 1982, for a recent" review) . Current evidence . 
favors "dual access" of the' lexicon by phonetic and visual processes operating 
in parallel. Yet regardless of how the lexicon is accessed, it° is, at base, ' 
the morphophonologlcal representation of a word that is being. accessed, and 
with it, the word's semantic extensions and syntactic properties and its 
phonetic representation. 

From the point of lexical access onward, the involvement of speech pro- ' 
cesses in reading is quite clear (Perfetti &■ McCutchen, 1982 j Shankweiler, 
Liberman, Mark, Fowler, & Fischer, 1979). First of air, there is much evi- 
dence that temporary memory for orthographic material (including isolated let- 
ters, printed nonsense syllables, and printed words) involves recoding the ^ 
material^'into some kind of phonetic representation. Evidence 'that phonetic 
representation is employed in the service of temporary memory for such materi- 
al can be found "in the nature of the errors subjects make, and in the way that 
a phonetfb manipulation, .such, as creating an inordinate density -of rhyriffng 
items, 'can penalize performance (cf.^for example, Baddeloy, 1978; Conrad, 
1964, 1972; Drewnowski, 1.980; .Levy,' 1977).- Adult subjects also, appear to 
rely on phonetic representation when they are required to comprehend written 
sentences (Kleiman, 1975; Levy, 1 977;; Slowiaczek & Clifton, 1980; Tzeng, 
tfung, & Wang, 1977). Moreoyer, when reading sentences and paragraphs, they 
appear to employ not only the temporary memory system, but also thie parsing 
system that supports recovery of the syntactic structure of spoken sentences 
and discourse. This is evidenced by the Significant positive correlations' be- 
tween reading and listening comprehension (cf. Curtis, 1980;' rianeman & 
Carpenter, 1980; Jackson & McClelland, 1979). 

To summarize, theerre'tical considerations and experimental evidence reveal 
that the critical determinants of skilled reading of English include sophisti-. 
( cation about phonological structure, 'and yie adequacy of certain processes in- 
tegral to spoken language comprehension. This recognition can now provide a 
meaningful framework within which to consider the process of beginning read- 
ing, and the p oblem of early reading difficulty. 

Language Skills and Beginning Reading " 

What is required for success in learning to read? Obviously beginning ' 
readers of any orthography must be able to differentiate and remember the 
various orth6graphic shapes. Yet they must also differentiate and remember 
spoken words, phrases and sentences, because without these, there would be 
nothing for the orthography to transcribe. The well-known difficulties of ... 
congeni tally deaf readers are one form of proof of the importance of spoken 
"language skills .for beginning readers. Further proof can be found in the re- 
lation between spoken language processing skills and success in learning to 
read. • • 

4 

\ 

Another requirement for successful beginning reading of the alphabet, in 
particular, is phonological sophistication (Libernfan, Liberman, Mattingly, & 
Shankweiler, 1980). Alphabetic transcription necessitates that the'child not 

10/ 
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only process spoken language effectively, but also be' sophisticated about the 
phonologieal units of language. For example, successful beginning' readers 
need not only distinguish words \ike "cat" and "hat," but be capable of hold- 
ing them in memory, so that they can comprehend the differences between "A cat 
is on the hat," and' "A "hat is on frhe— oaW-^They must further possess the 
linguistic sophistication that allows them. to perceive the phonological rela- 
tionship between "cat" and "hat"— that, among'other things, these words differ 
in one phoneme, the first, and share a phoneme, the final one, which is the 
'same as the initial phoneme in "top." Without this and other aspects of 
phonological Sophistication, the alphabet will remain a mystery to them' r and 
its virtuesJ unreal zed. Research reveals, however, that this and other as- 
pp-*:s of phonological sophistication pose a difficulty for many young chil- 
dren, ^particularly, those who Incur early reading problems. 
• ' ' A * ' 

•i Having made these preliminary points about tfie requirements of beginning 
reading, let me turn to the problem of early reading difficulty, discovering 
its- associated language, deficiencies and predicting its occurrence. The past 
decade has witnessed considerable interest in these matters, and many studies 
of the psychology of early reading problems have uncovered aft association be- 
tween difficulty \in-- learning to read, and difficulty within the two domains of 
sppWen language processing skills and phonologioal sophistication. j 

The Relation between Spoken ' Language/Skills and Reading Difficulty 

Spoken- language processing skills/are important to beginning readers of 
all orthographies, and in accordance ' with this fact, a link between early 
reading a'bility and spoken language processing ability has been established 
for more than one alphabetic orthography (cf. Mann, 1982j Stanovirn. 1982a, 
1982b), and for syllabaries and logographies as well (cf. Stevenson, f,tigler, 
Lucker, Hsu, & Kitamura, 1982). For the sake of brevity, this presentation 
will, focus exclusively on the language processing problems found among poor 
readers of English. 

, i • 

As for such problems, it is by now quite clear that poor readers in the 
early 'elementary grades (i.e. , children reading a half-year or more below 
grade 'expectation) do* not suffer from a general impairment in perception,, or 
in learning aYid memory, so much as from a language impairment that specifical- 
ly penalizes certain phonological processing skills.*. For example, poor read- 
ers tend to be equivalent to good readers (i.e., children reading a half-year 
'or more above gradp expectation) in audiometry scores and nonverbal auditory 
perception, yet are inferior in ability to identify spoken words that are par- 
tially masked by noise. (Brady, Shankweiler, & Mann, 1 983). Poor readers also 
appear to have some other difficulties, in recovering the phonetic representa- 
tion of words, 'as evidenced by the'ir difficulty with object and letter- naming 
tasks (cf. Denckla & Rudel, 1976; Katz, 1982). Furthermore, they have a 
specific difficulty with short-term memory for verbal material. For example, 
poor readers do. less well than good readers when temporarily remembering 
printed nonsense syllables, but ftot photographs of faces, or other purely 
visual materials (Liberman, Mann, Shankweiler, i Werfelman, 1982). They also 
tend to have difficulty 'recalling strings of spoken digits, spoken words, and 
even the words of spoken sentences (Mann, Liberman, & Shankweiler, 1980), yet 
have no such difficulty recalling nonverbal stimuli in a block-tapping task 
.lann & Liberman, 1 984) . . 
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One deficiency that is basic to t reading and other language skills is a 
deficiency in use of phonetic representation in short-term memory (Libermah, 
Shankweiler, Liberman, Fowler, & Fischer, 1677} Mann, 1984* Mann, Shankweil- v 
M er, & Smit/h, 1984).' it is.this deficiency thVt limits the poor readers!, abil- ^ 
ity to recall immediately such verbal material as syllables and words, phrases' 
and • sentences, regardless of whether the material is heard or read/ That ~ 
phonetic representation is problematic for poor readers can be seen in the re- • - 
suits of studies that hav<e manipulated a phonetic characteristic of the 
material being recalled, namely, the density of rhyming 4 terns. Normally, when 
.the to-be-recalled items do not rhyme, good readers excel with respe6t to poor 
readers. ' However, when all of the items rhyme, the advantage of the 'good 
readers is ' greatly , reduced or even eliminated, because, for them, as for 
adults, the presence of 'phonetic confUsability penalizes their ability to 
remember the words in order. Poor readers, in contrast, are less penalized by 
the presence of rhyme, that is, they .are not as susceptible to the stress on ' 
phonetic^epresentation» This result, was originally demonstrated for recall 
of lette.r strings by eye and by ear (Shankweiler et al., 1979), and has been 
extended to recall of spoken word strings and spoken sentences (Mann et al., 
1980). Taken together, these two findings, that poor readers' inferior 
short-term memory performance is confined, to verbal material, and that there 
are consistent discrepancies between good and poor readers' susceptibility to 
the effects of rhyme, support a conclusion that poor readers are somehow lack- 
ing in ability to retain the full phonetic representation of words 'in 
short- term memory. ^ 

One consequence of a ' ick of effective, use of phonetic representation on « 
the' part of poor readers "\d seem to be a difficulty with the comprehension 
of certain spoken sentences, .s well as with the repetition of sentences (Mann 
et-al., 1984). For example, when'^required to act out the meaning of sentences 
that contain relative clauses, poor readers tend to make mor>e mistakes than 
good readers because they made relatively more of the kinds of errors that 
slightly youngerf children make. On the basis of this finding (and other work* 
we have done that suggests that poor readers do not always comprehend sen- 
tences less well than good readers) ,- we have suggested that difficulties with 
phonetic representation may retard certain aspects of syntactic development 
among poor readers (Mann et al., 1984). Thus, • out of a primary difficulty 
with phonetic representation may come second-order difficulties with other as- 
pects of language development, including syntactic development and, ultimate- 
ly, reading acquisition. ^ 

Having shown a connection between early reading difficulty and deficits 
in phonological processing skills,. I will now turn^to the results of two 
longitudinal studies showing that difficulty with certain phonological proces- 
ses can often be foun^j^ antecedents of reading failure. 

Phonological processing Skills Can Presage Future Reading Abilit y , 

The first study to be described (Mann & Liberman, 1984) reveals that 
those kindergarten children who make le3s effective use of phonetic represen- 
tation in a word-string recall task are likely to become the poorer readers of 
their first-grade class. The subjects were a population of 6^/1<indergarten 
children whom we followed longitudinally for one year. During May of the kin- 
dergarten year, we assessed IQ, phonological abilities (to be described 
later), use of phonetic representation, as indexed by the ability to repeat 
strings of rhyming words and strings of nonrhyming ones, and also nonverbal 
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short-Verm memory, as- indexed by their, ability to", repeat a block-tapping se- 
quence! on the.Corsi Blocks. In May of the following year, when the children 
were «at the end of the first g^ade, we again tested their < memory, and also 
tested their 'reading ability. At that time, the teachers rated the children 
as good, average, or poor in reading ability. 



Our finding was that children In the three. reading groups did not differ 
in age, nor-.hati they differed- in kindergarten measures of IQ. Likewise, they 
d^d not differ in nonverbal memory performance as measured by the Corsi block 
te3t, either when they were kindergarteners or when .they were in the first - 
grade.. What we did find was that children who differed in reading ability; 
significantly differed it) their ability to .repeat a string of spoken- words; . 
In addition, as we -had discovered in the past, the extent of 'difference among 
children in the three reading groups was greatest in the case of phonetically , 
nonconfusable words. Most importantly, these differences had been present be- 
fore the children entered the first grade. As kindergarteners,, the future 
poor readers made significantly more errors than the future' good readers on • 
the word jstr/ngs, and their performance \was not penalized by the presence of 
rhyme in the way that the performance of good readers was. Hence we concluded 
that the /future poor readers did, not ,make as effective use of phonetic 
representation as the future good readers did, and that this deficiency - *' 
presages ' reading difficulties in the first grade. (For a more thorough report 
of this study and its materials, 'see Mann &, Liberman, 1984). ^ 

Tire results of a aecondY - newly completed longitudinal study make much the 
3ame point, extending the demonstration that tests of phonological processing 
can presage reading success. The results also reveal that screening can be 
conducted at an earlier time in the* school year, and still effectively predict 
future reading ability. The subjects were 11 children^tested during January 
wf the kindergarten year and .again the following January when they were in the 
first grade. As kindergarteners, .they received an IQ test (the Peabody Pic- 
ture Vocabulary Test), a verbal memory test (involving immediate, verbatim re- 
call of seven strings of four unrelated words), a naming test? (rapid naming of 
a randomized sequence of the capital letters of the alphabet as a measure- of 
access to phonetic representations), and a syntactic .test (manipulating toy 
dolls to enact the meaning of eight active' sentences and eight passive sen- 
tences). They also received, two tests of phonological sophistication, which 
will be described in the following section. As first graders, the reading 
ability of each child was established by administering the word recognition 
and word attack subtest's of the Woodcock, and by asking the teachers to rate * 
the child as good, average, or poor in reading ability. (Statistical evalua- 
tion of the results of this study can be found in the Appendix.) 

The general profile of this population is' summarized in Table 1, where 
the children are grouped according to teachers' ratings of their reading abil- 
ity. Children in the/three reading groups did, not differ in age, or IQ, but^ 
d-id differ in the su/ of "aw scores on the two Woodcock tests. A summary of 
children's performance on the kindergarten tests of language processing ap- 
pears in Table 2. The future poor readers were slower at naming the letters 
and made more errors than the future good readers did. As in our previous 
longitudinal study, the future poor readers also recalled fewer words in the 
verbal' memory test than either good or average' readers. Thus, these two tests 1 
of phonological processing skill, letter naming and temporary verbal memory, 
proved capable of distinguishing the future poor readers frojaXhe other chil- 
dren in their classrooms. In contrast, the third test that appears in Table 
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* Table 1 x . 
Children Participating in a Longitudinal Study: Basic Profile ■ 

Reading Ability , Kindergarten^ Testing Firs^-Grade Testing 

• # • 

(Rated by 'f irst-^ Mean Age % Mean IQ Mean Woodcock Raw Score 

grade teachers) ' (in months) (Peabody) (Wortf ID + Word Attack) 

Goods"Readers ' 69.2 118.5 - 94.8 

•N-10 • *• . ' • . . 

* ' ' ' . ' ^ 
Average Readers 72*7 -118.1 45.2 

. . N-22 r 

Poor Readers ' .72.2 116.7 ' 16.2* 
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Table 2 " . 
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Performance on Kindergarten Tests of Linguistic Processing 
Relation to First-Grade Teachers' Ratings of Reading Ability 


Reading 
Ability* 
(first- 
grade 
rating) 


/ Letter 

(mean 
**-speed 
* ill s) 1 

• 


Naming^ 

-"(mean - 
errors) 


Verbal Memory Passive Sentence 
* * *r Comprehension 
' (me^n, words - (mean items 

correct; correct ; 

max. =28) max 4 »8.0) 


Good 


21. 3» 


0.0 


* 

22.8 • ^ 8.0 


Average 


30.7 


S^O.8 


16.9 6.5 ' / 


Poor 


46.4" 


3.3 


13.0 ' 6.5 
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2,. the. teat. of syntactic, ability, was only moderately successful. While'lt 
appeared to distinguish the future poor readers fVom the good readers, it did 
not distinguish them frc the average ones. ' 

* Take'n together, t-the two longitudinal studies support the conclusion' that 
"tests" df language processing skill are better predictors, of future reading 
ability than age, comparable tests of nonverSal -memory^ or tests of IQ. Tests 
of phonological processing skills, in particular, appear to be the besft pred- 
ictors of .future reading ability although more work is needed before a final 
conclusion ca^ be reached. , Let me now turn to those additional findings from 
each study that reveal that tests, of phonological sophistication can also pre-* 
diet future heading ability. • ' 

f m * 

Phonological Sophistication Can Presage Future Reading Ability 

The first poiot to be made? i.n this section is that* the importance of 
phonological sopnistication to < early reading ability is evident in .the oral 
reading errors that good and popr beginning readers make, linguistic analyses 
of such errors '(see, for example, Fischer, Liberman, & Shankweiler, 1977; 
Shankweilel* & Liberman, 1^972) have shown that the reading difficulties of most 
ch'ildren, including those diagnosed as dyslexic, ter\d not to involve 
deficiencies in visual per^atfJJLori or memory, so much as in phonological 
sophistication. Errors involving letter and sequence reversals are' relatively 
inf sequent, as compared to errors that reflect a problem in relating the 
'structure of the printed word ' to the phonological . structure of the spoken 
worb. . 

v 

Additional evidence Tthat phonological sophistication, is a special problem 
for poor readers can be Sound in the results of. a study , I/recently conducted, 
in collaboration with Isabelle Liberman and Hyla Rubin. Trie subjects, were 62 
third-graders, who were divided into three reading-ability groups according' to 
their teachers' ratings. The study involved having children read the words of 
Galistel f s GE Test of Coding Skills, ih which words and phonologically plausi- 
ble nonwords are arranged into ten categories according, tq^ the complexity of 
the phonological relation between the printed'ahd spoken wor;d. Our interest 
was in the specific types of words that caused children difficulty. All chil- 
dren made some errors on the Galistel test, and, as would be expected, the 
poorer readers made more errors than the better ones. Apropos of the point 
that poor readers are lacking in -phonological sophistication, it was found 
that those categories that placed the greatest demands on phonological skill 
were inordinately difficult for these children. v . 

Another finding that makes- a similar point is summarized in Figure 1 
The figure displays an analysis of children *s responses i^i.* oral reading 
according to the - familiar ity of indiyidual words. This analysis was done. by 
noting Which of the test words were include* in th6 Cheek basal list for the 
first and second grades. It can be seen ttiat all children wer/e quite success- 
ful in reading the^highly fami^i^r Cheek words. More errors occurred' on the 
other categories: words not included in the Cheek list /and phonological 
plarsible nonwords. If visual memory were a problem for the poor readers, we 
would have expected thein to do inordinately poorly on-the presumably less fa- 
miliar words not on the Cheek list as compared to the better readers, but not 
necessarily on the nonwords, which qpne of the children had seen .before. How- 
ever, what we found is- that the poorer readers had inordinate difficulty with 
both the non-Cheek words and the nonwords. That^is, they were distinguished 
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by their poor ability to' read^ any items t^hat plm* demands on phonological- 
ly-baaed 'analytic abilities and not by a lack of visual' memory, per se. • it 
may also be noted irr. Figure 1 that ^the nature of reading errors is the same 
for boys and girls. This" is in "agreement* with our earlier findings in regard 
to absence of gender-sptcif ic patterns' o£ .deficit: the* nature of reading dif- 
ficulty does not depend on the sex of the child. (Liberman & Mann, 1981). 
(•Statistical analysis of the da»ta appears in £he Appendix.*) The epror;s of k poor 
readers, then, reflect- some lack of phonological sophistication. * - 
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figure 1 , The proportion of words' misread by girls and boys, as a function of 
reading ability and stress on phonological 'sophistication. 



As will, become apjyfrent, longitudirjafl- work further reveals that phonolog- 
ical d-e/iciencies can antedate reading di'fficul-ty in the first .grade. Before 
presenting that work in detail , however 9% it' Js ' appropriate for Hie to makg a 
few observations about the development of ' phonological sophistication and its 
relatfon to reading instruction. A variety of* evidence suggests' that there is 
a reciprocal relationship between learning to read an. alphabetic orthography I ' 
and awareness about phonemes. (But, ffs we shall see, matters .are somewhat 
diffefent in parts of the world where a nonalphabetic form of ^writing is in 
use.) First of all, research in the U*S.* and. England indicates that four- anh 
five-year-old children are generally lacking in' awareriess^ aboui phoneme^ and^ 
a spurt in their phonological pophf^ticatiori* occurs -at age six, when ipost of 
themb^gin to receive reading instruction (Bradley& Bryant, 1 983 C Liberman, 
Shankweiler, Fischer, & Carter,, 197M). Second, in Portugal .where the # writing 
system is, of course, also alphabetic, it has been four\d tha^ most illiterate 
adults canno'c add or delete initial phonemes "in spoken 'Qt^erances as weli as 
literate ones'can (cf. Morais, Cary, Alegria, & BerMl l son, 1979). Thircf, in 
Japan, where nonalphabetic ^orthographies; are employed* I Rave .found,, that 
first-grade children cannot count, ^d^lete br reverse phonemes as easily as 

' 113 

113 - ' ■ ' 



^ Mann: Prediction apd Prevention 

American children of the same age. Nonetheless, seme other work "provide's evi^ 
dence that reading 'instruction is not the only determinant of phonological' 
sophistication. For example, I have found that some Japanese children apf 
aware of phonemes, regardless of their lack of exposure ,to the alphabetic, 
principle. It has also- been nojbed that some English-speaking children fail to 
acquire phonological sophistication despite- considerable instruction in the 
use .of , the alphabet (Bradley & Bryant, 1978). (It is for all of these rea- • 
sons, perhaps, that studies,, employing widely diverse^ subject populations, 
school -systems, and measurement devices indicate ' a "stro'ng correlation between 
-lack of. phonological /sophisticatioq in - Kindergarten and "later success .in 
learning 'to read). *. »'•<*- ' : 

The first longitudinal study described in the previous section is a c.ase 
in point. In'that study (Mann & Liberman, 1984), we assessed the phonological 
sophistication of kindergarten children by requiring 'them to 'induce the rules ✓ 
of a game that involved counting the number of syllables In spoken words. 
Syllable counting was measured instead of phoneme counting, -because awar.eness.^ 
\>f syllable-sized units can .be expected to* precede awareness of phonemes, §h d iK* 
is probably a' natural cognitive achievement of sorts, Since it can *be present" 
in preschool ■ children (Liberman et al., 1974). Moreover,- unlike, phoneme 
awareness, syllable awareness is not strongly facilitated by a- phonics-based, 
program of reading instruction (Alegria, Pign'ot, & Morals, 1982). We 1 found;, 
that\the future poor readers, as kindergarteners, scored fbwer u on the syllable 
counting task, often performing at chance level, and rarely- * achieved ' the. six* 
correct responses in a row needed to pass criterion. The future good neaderSj 
tended to receive the 'highest scores, to do considerably better than chance,, 
and most of them had passed criterion. The performance of the average readers 
'fell in between these two extremes*. 

Turning now. to the second longitudinal study, we measured "both syllable-, 
and phoneme-awareness under the guise of a "talking backwards" game. Syllable 
awareness was measured by requiring children to- reverse the order*" of the syll- 
ables in a two- or three-syllable nonsense word; phoVieme awarfeness was mea- 
sured by requiring children to reverse the 'order of the phonemes, in a 
two-phoneme nonsense syllable. The results are shown irt-Tafcle 3, where it can 
be seen ttiat, on each test, the future poor readers 'did worse than the future 
average readers and the good readers did best of all. The strongest differ- 
ences, however, and the ones that are the most predictive, involve^ performance 
on 'the phoneme reversal test. . 



•JL. 



Table 3 



Performance on Kindergarten Tests of Linguistic Awareness 
in Relation to First-Grade Teachers* Ratings of Reading -Ability 



Reading Ability 
(first-grade 
rating) 

Good 
Average 
Poor * 



Syllable Reversal 
(mean items correct; 
max.=l6) 

14.6 
14.4 



Phoneme ReVereal 
(mean items correct; 
max.»l6) 



5.2 
.8 
.0 
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To summarize, then, in addition *to**tests of phonetic processing ability, 
tests that require manipulation (i.e., qounting, reversal) of v the syllables* or 
phonemes in spoken words can also effectively prjesage future reading ability. 
Perhaps the best predictor* of this type-would be a test Involving some manipu- 
lation of phoneme-sized uni-ts*' although.^ this has the potential di advantage of e 
confounding differences in native ability 'with differences' In ex\gnt of Expo- 
sure to rea- 4n'g instruction* • 9 . 4 * ' * 

' Some Remarks on the Prediction and Prevention of Early Reading D ifficulty)] 
~ : ~ • \ .. , * ' ' 1 » — 

\ • * •. ^ f 

'. As. has been ^oted elsewhere (Mann & Liberman,M,984) the primary contribu- 
tion of bur longitudinal' research is to*' suggest that, among kindergarteners, 
'the status of certjain phonological skills—verbal .short-term memory, ^letter 
namlng-^ability, ,awarenestf*about syllables, and awareness about • phonemes— may 
presage first-grade reading ability. Tests of. these skills might therefore be 
used as part of a kindergarten screening-battery. In this, light, I will' 
consider- some of the practical implications of each of' the longitudinal stud- 
ies I have described. . ^ ■ • ; \ / 

• ; \ 

* . * \ 

The first. Indicates- that measures of two skills, performance inVecalling 

a string of honrhyming words and performance in counting the taumber\jf, syll- » 
abl.es in spoken words, an together account for about a quarter of the total 
variance in children's first-grade reading ability. The success of these two 
measures 14es not in their ability to predict f^ne differences inability, but 
in their ability to predict the extremes"" of reading* ability . A child who does 
well >on both' tasks is not at risk for future reading problems, whereas chil- 
dren who fall, within thg lower quartile of ,a kindergarten populatioff*Vn their 
performance oi* both tasks have a sigoifidant likelihood of encountering * read- 
ing difficulty. /' . . 
" <*• 

A somewha^f iner graje of predictive success can be" achieved using the 
results of the second, raore B recent, longitudinal study. When kindergarten 
performance on three measures, letter naming speed 5 , accuracy of word string 
recall, and accuracy in reversing two^phoneme utterances, .are entered into a 
regression equation, .they account for 7^?*of the variance in raw scores on -the 
Woodcock tests. ' Hence children who rank in the^qwer quartile of the class in 
letter naming ability, verbal memory ,* and phoneme awareness shouldV_surely be 
cosisi.der'ed at risk. As for those who are deficient in only one- of-^wo of* 
these skills, with future Research .it should be possible to determine the rel- 
ative . importance of , the/e factors in'-, terms of -their contribution to the 
likelihood of a child's encountering difficulty in ltarning to read. 

\ ' • 

The development of , tests that successfully identify children at risk for 
reacting, problems is surely an accomplishment. However, whaft is one to do with 
tft6 child considered at risk? Teachers and others interested *in the question 
of how 'to prevent reading probletas^ wguld do well- to read what Isabelle Liber- % 
mar has written on this subject *(Liberman f ~ 1 982; Libermari, Shankweilfcr, *\ 
Blachman, Camp, A t Vterfelman, 1 980 ) . -Here, I would like to focus briefly on 
several points, most of which hav'a been made elsewhere (see, for example, 
Liberman* 1982, or Mann & Liberman, 1 98^4 ) • 

Considering first the child who is lacking in the phonological processing 
skills necessary for effective verbal short-term memory and letter-naming 
ability, it must be said that the prospects for remediation of these 
deficiencies have not really been ^explored. There is considerable reason to 
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•consider the possibility that deficient phonological processing skill is a 
consequence of a -specific "matu/ational lag in language'., development. However,' 
the concept of matura^idnal lag typically fmpltes that/the poor readers will 
"caticji uft" to the good- readers; given enough time, and/ this implication is in- 
consistent with findings thac specific verbal deficiencies sometimes persist, 
as in adolescents suffering from developmental dyslexia (cf. Mann, in press, 
anaTMann & Liberman, 1984, for appropriate References) . .Thus it is possible 
that the language processing de-f iciencies ' we may detect in the kindergarten 
child- will be of a xiermarient nature, this is not to say that Remediation is a 
hopeless cause, p€r we may still expect that', through appropriate interven-' 
tion, the e*tenbr of the" deficiency can be lessened. Unfortunately research 

Mias not yet specified the exacb form of remediation that is most desirable. 
It is logical to. think that" children* might be helped by practice in naming 
letters^ and objects' as well as , by practice in learning nursery rhymes, and 

-stories by heart. .Suoh practice might help children to exerciser those lan- 
guage . processing"' that they do possess. ' Yet 'it must-^be kept in mind that^ 
remedying" a specific, symptom need -not remedy fche underlying cause of that' 
symptom*.-. Clearly much research is needed in v this area- 

' v The prospect '^may be brighter, however, jrfith regard to remediating 
"deficients in phonological sophistication, - ;^ While it is ' true that some as- 
pects or* phonological sophistication, such as initial awareness about syll- 
ables, tend to be -natural cognitive' achievements, much of the development of 
phonological awareness may be facilitated (if not precipitated) by experience 
that encourages the chfcld to manipulate .'phonological structure. For some 
children this experienee may "involve nc? more than learning the correspondences 
between certain written ai.d spotfen words. Even minimal expo sure • may be enough 
to enable some children to discover the alphabetic principle for themselves. 
This is probably true of unexpectedly preooc^ous readers, and of most children 
who survive the basal method of /beginning reading instruction. Yet other 
•children don't discover that principle for themselves, and may need some 
Systematic training in order to achieve the level of sophistication about pho- 
nemes and phonological' rules that is required for skilled reading of English. 
With all due respect -to Socrates', it isn't really good educational policy to 
make, all children reinvent » the alphabet for themselves— we should let them in 
on the secrets of alphabetic transcription as early as possible, 

How is this to be done? To begin with, children should be read to, and 
their attention should, be directed to the printed* words fcnat correspond to the 
spoken words of their favorite story. Teachers and parents ean use many in- 
direct "methods to draw children's -attention to phonological structure — teacl- - 
ing nursery .rhymes and • poetry ,„ for example, or encouraging secret languages 
like "pig latin" or "talking backwards." -They can, for example, give children 
special nicknames that involve some* systematic manipulation of phonological 
structure '(such as reversing £he order of syllables-, or dropping the first . 
phoneme), and then ask the children to invent similar nicknames for their si- 
blings and. friends. Once <at ten tion is directed to phonblogical units, direct 
awareness training can.be instigated through counting games or elision games, 
starting 'at the less abstract J.evel of the' word and working down to the level 
of 'the phoneme. Finally, phoneme awareness and reading could be introduced 
withthe procedure of Elkonin. ' 

. The Elkonin procedure has been described elsewhere (Liberman, 1982; 
Liberman' et .al., '1980;* Man rt & Liberman, 1984) and for the sake of brevity ^1 
will onlyVeview its merits. It provides 'a linear visuospatial structure to 
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which the temporal sequence of phtonemes in £ 'spoken word car\ be related. It 
gives the child the actual number of phonetic segments in a* word * so that 
uninformed guessing *s not , nectssary . Explicit naming* of pictures is required 
and' can exercise the child's ability to acjbess the phonetic representation of 
a^ word rapidly. Sine 6 the picture is. .always present,- and ffnly one is consid- 
ered at a time, 'demands on verbal short-term memory are minimal. For\ all of 
these reasons, the Elkonin procedure is especially advantageous for Jlise with 
children who, by virtue of* inferior phonological sophistication, naming abili- 
ty; ■ and verbal, short-term memory, have been identified at-risk for- future 
reading problems. If adopted for general use, It could help to ameliorate 
reading difficulty,. and'<night be expected to speed th6 progress of any begin- 
ning reader* , 
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Appendix . >' 

Statistical Evaluation of Experimental Results 

Longitudinal Study of Language Processing Skills, Linguistic Awareness, 
a nd Reading Ability . Statistical analyses r of the data summarized in Tables 
1-3 include* t-tests of differences between the scores of good and poor 
readers, and Pearson correlations between tarious scores and a measure of 
reading afcility. Turning first to the data in Table 1 •, the good and poor 
readers differed in the sum of raw seores*\op the Woodcock Word Attack and Word 
Identification, t(20)»5.3; -.£<.002- f although they did not differ in age, or in 
IQ at the .05 J.ev.el of confidence. . A3 for the data in Table 2, as 
kindergarteners, the future good and poor readers differed in all four 
measures of language processing: 1) 'speed of letter naming, > t(20)*3.32 f 
p<.01, 2) errors in naming the letters, t(20)=5.91; p<.0003, 3) verbal 
memory, t(20)*2«2, p<.05 and 4) comprehension of passive sentences', t(20)*3.6; 
p < .01. Pearson product moment correlations revealed significant associations 
between the first-grade sum 6T raw scores on the Woodcock Word Identification 
and Word Attack Subtests and the kindergarten measures of: letter naming 
speed, r(44)=-,42, letter naming errors, r( 44 )»-. 52, and verbal memory, 
r(44)*,56, all of which are significant beyond the .05 level. The correlation 
between Woodcock score3 and comprehension of passive sentences failed to reach 
significance at the .05 ,level*of confidence. Finally, as for the data in 
Table 3 1 which concents performance on the two tests of language awareness, 
good and poor readers significantly differed in performance cm the phoneme 
reversal test, t(20)«9.2, £<.0002,- but not on the syllable reversal test. 
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Woodcock ^ scores and 

on the phoneme' reversal test, r(^)«;75, although the correlation 



significant correlation existed between 



Likewise, a 
performance 

between Woodcock scores and performance on the syllable reversal , ' test 
to reach significance at 'the .05 level of confidence. 



failed 



Grade. 



Reading Errors Among Good, Avera ge and P^opr Readers in the Third 

The teacher-ratings of heading ability are confirmed by the finding that, when 
raw scopes on the Woodcock Word Identification and Word Attack Tests were 
summed, the poor readers had correctly read an average of 133.2 words, whereas 
average readers had read 156.2 and good readers, 175.6. Statistical analysis 
of the reading errors made on the GE test (summarized, in Figure 
of an analysis of variance involving reading ability, sex, GE • 

analysis revealed that children in the 
overall performance, with the better 



Cheek category. That 
groups differed 'in their 
fewer errors than the 
F(2,56)-31 .55, p_<.0001 



1) consisted 
category and 
three reading 
readers making 

poorer ones,, and the average readers falling between. 
Certain parts of the GE test were harder t han Others, 
F(9 ,504 )-75. 38,"~ p_<.0001 , but the poorer readers encountered inordinate diffi- 
culty "with these parts, of # the test relative, to the better readers, 
F(18,504)»6.28, p_<.0001 . In ge'neral, the Cheek words were easier to read than 
non-Cheek words, which, in turn, were' easier than the phonologically plausible 
nonwords, F(2,1 \2)-205.5, £<.0001. Most importantly) as compared to the bet- 
ter readers, the poorer readers encountered much less difficulty with the 
Cheek words than with the words that we^e not on the Cheek list, and with the 
phonologically plausible nonwords, F(4,1 12)=22.9. This basic pattern .of re-, 
suits was not a function of the sex of the child, as the main effect of sex, 
and all interactions involving sex, fail to reach significance at the ,05 lev- 
el of confidence. 
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TEMPORARY MEMORY FOR LINGUISTIC AND NONLINGUISTIC MATERIAL IN RELATION TO THE 
ACQUISITION OF JAPANESE KANA AND KANJI 

Virginia A. Mann't 



Abstract. It has been found that gotfd and poor beginning readers of 
the alphabet perform equivalently in temporary memory for st^ch 
nonlinguistic visual material as abstract designs and* faces. Good \ 
• readers excel, however, in temporaF'y memory for such linguistic 
material as printed or spoken words, because they make more effect- 
ive use of phonetic "representation. To determine whether linguistic 
ahd. nonlinguistic memory skills have the same. relationship to read- 
ing skills among good and poor "beginning readers of nonalphSTbetic 
orthographies, the present study focused on 1 begirtning. readers of 
Japanese K^na and Kanji. Two experiments employed the recurring 
recognition paradigm of Kimura to assess, the relationship between 
temporary memory for various types of material and Japanese chil- 
^ren's reading ability in the second grade. * The first expedient 
examined tempqpary memory for spoken nonsense words, and revealed 
that good and poor readers of Japanffee differ in the use of 
nonalphabetic orthographies. The second explored the relationship 
between memory for nonlinguistic visual material and orthographic 
material including: 1) abstract designs, 2) faces, 3) Hirigana, and 
Kanji. It revealed that children who differed in reading ability 
differed in memory for Katfa and Kanji, but, unlike good and poor 
readers of the alphabet, also differed in memorv for the abstract 
designs. In particular, memory ^ for Kana'was significantly 'related 
to that for Kanji and spoken syllables, but not abstract designs or 
faces, wneress ^memory for Kanji was significantly related to that 
for nonsense designs and spoken syllables, ^but not faces. • The 
implication is that for nonsense designs and spoken syllables, but 
not faces. The implication is that effective use of phonetic 
representation contributes to sucoessful acquisition of all 
orthographies, whereas the. importance of nonlinguistic memory skills 
can depend on the nature of the orthography at hand. 
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Temporary memory is important to the wruld-be reader of any orthography. 
Whether material is Written in^an alphabet, a syllabary, or* a logography, the 
reader who intends to comprehend its full meaning must be able to retain the 
infonmation represented by individual characters, until such larger units as 
words, sentences, or paragraphs can be apprehended. How beginning readers of 
syllabaries and legographies meet} these t^Bpbrary storage requirements is the 
issue at stake in this study, which examines the relation between temporary 

' memory skills and succe&fi in learning to read Japanese. ..Several different 
types of material are. to be considered, as an accumulating body of evidence 
from the psychological (Ellis, 1975; Woodhead & Baddeley, 1981) -and 
neuropsychological literature (see, for example, Kimura, 1963; Milner & Tay- 
lor, 1972; Warrington & Shallice, 1969) reveals that temporary memory can in- 

~~~""~vx>lve separable components. One such component employs phonetic representa- 
tion as a means of retaining linguistic material such as names of objects, 
spoken or printed' words, etc., and is localized within the left, language dom- 
inant hemisphere. This stands'in contrast to another component^ which employs 
nonlinguistic representations- as a means of retaining such nonlinguistic 
materials as abstract designs- and faces, and is localized within the right he- 
misphere. The question to .be asked over the course taf two experiments is 
."whether the linguistic and nonlinguistic components of temporary memory make 
equivalent contributions to success in learning to read Kana and Kanji. 

In recent years, studies in America and Europe have asked whether success 
in learning to read alphabetic orthographies is related to the ability to 
remember certain types of information. These studies reveal that not all tem- 
porary memory abilities 'tend to distinguish good and poor beginning readers of 
- the alphabet. For example, good and poor readers in the second grade do not 
significantly differ in the ability to remember such nonlinguistic visual 
material as photographs of people's facesjJJ or abstract visual designs '(see, 
for example, Liberman, Mann, Shankwejirfr, & Werfelman, 1982;. Vellutino, 
Steger, DeSetto, & Phillips, 1 975 ). Yet it. is qviite evident that good readers 
surpass poor readers in temporary memory for syllables, words, and sen- 
tences—whether these are heard or read (see, for example, Byrne & Shea, 1979; , 
Mann, 1984; Mann, Liberman, & Shankweiler, 1980 Mark, Shankweiler, Liberman, 
* Fowler, 1977; Shankweiler, Liberman, Mark,'- Fowler, & Fischer, 1979). This 
has been explained by appeal to evidence' that superior readers make effective 
use of phonetic representation in temporary memory (Shankweiler et.al,, 1977). 

Attempts to clarify and explain the association between effective use of 
phonetic representation and early reading ability have tended to focus on 
beginning readers of English (see, for example, Brady, Shankweiler, & Mann, 
1953; Katz, Shankweiler, & Liberman, 1981; Mann, in press; Mann & Liberman, 
108m| Mann et al., 1980; Mann, Shankweiler, & Smith, 1984; Shankweiler, 
Liberman, Mark, Fowler, & Fischer, 1979), or of other alphabetically-tran- 
srribed languages such aV* French (Alegria, Pignot, & Morals, 1982), Swedish 
(Lund berg, Oloffson, 4 Wall, 1980) and Dutch (Mann, 1 982 ). One possible ex- 
p I mat ion is that learning to decode a phonetic transcription of spoken lan- 
guug" piaces cerX*«T[k*-rftmands on memory for phonetic material (see, for exam- 
ple,' ShankweileT & Liberman, 1976). If so, the association between use^of 
phonetic representation and reading skill might be restricted to readers of 
the alphabet (conceivably extending to readers of a syllabary, since a 
syllabary is a type of phonological transcription) but not to readers of <a 
logography, since logographies do not transcribe the phonological structure of 
spoken words directly. Another possibility is that phonetic representation is 
critical to all language processing, spoken and written alike, because it 
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meet^ the temporary storage requirements involved in recovering phrases., sen- 
tences and paragraphs from sequences of individual words (see Liberman, Liber- 
man, Mattingly, & Shankweiler, 1980; or'Mann, 1984). In this case, the rela- 
tionship between phonetic representation and reading ability should extend to 
readers of any orthography— alphabet, syllabary or logography— because all of 
these require that readers recover phrases, sentences, etc. one way 'or anoth- 
er. 

. #. 

. One straightforward means of determining why use of phonetic representa-* 
tion is associated with reading skill is to examine the use of phonetic 
representation by good and poor readers of syllabaries and logographies. 
Japanese has the virtue of using botjr, however, relatively little attention 
has been devoted to the temporaryy'memory skills a^soc iated with acquiring 
Japanese. As some type of temporary memory should be essential to readers of 
syllabaries and logographies, we would expect some relationship between tempo- 
rary memory 3kill3 and success in learning to read Japanese. It is even 
possible that the relationship between reading and memory skill will involve 
linguistic memory, and phonetic memory in particular, given some evidence That 
for both Japanese children and American children, memory for the meaning of 
spoken text, as well as serial memory for words and digits, is associated with 
\reading ability fn the fifth grade (Stevenson, Stigler, Lucker, Hsu, & Kita- 
\mura, 1982). Yet there is no direct evidence about the use of phonetic 
representation by children who are in the early grades in Japan. 

Certainly it is possible that effective use of phonetic representation 
characterizes good readers of no^ialphabetic orthographies just as it charac- 
terizes good readers of the alphabet. This follows from a consideration of 
the fundamental nature of all orthographies, and from an observation about a 
coding strategy that is common to skilled readers of Chinese and English. All 
orthographies function to transcribe spoken language, hence it would be 
parsimonious if reading drew upon some of the processes that otherwise support 
spoken language use. Skilled readers who are attempting to remember written 
words in order to comprehend written sentences and paragraphs might rely on 
phonetic representation, as phonetic representation fulfills the temporary 
memory requirements of comprehending spoken sentences and paragraphs. Confir- 
mation that this is indeed the case has been provided by experimental studies 
showing that both the temporary memory for orthographic material (including 
isolated letters, printed nonsense words, and real words and sentences) and 
the comprehension of written text involve receding print into *a phonetic 
representation. Most importantly, phonetic representation is employed in the 
service of temporary memory and comprehension whether subjects are reading the 
Knglish alphabet (see, for example, Daneman & Carpenter, 1980; Kleiman, 1975; 
Levy, 1977; Meyer, Schvaneveldt , & Ruddy, 1974; Slowiaczek & Clifton, 1980) 
or the Chinese logography (Hung & Tzeng, 1981; Tzeng, Hung, & Wang, 1977). 

However, it is nonetheless possible that the acquisition of nonalphabet ic 
writing systems places a less severe demand on phonetic memory than the alpha- 
bet does. Before beginning readers can begin to comprehend phrases and sen- 
tences, they must learn to decode individual words. While all "orthographies 
serve to transcribe the words of spoken language in one way or another, they 
differ in the nature of the units they transcribe; alphabets transcribe pho- 
nemes, syllabaries such as the Japanese Kana transcribe syllables, and 
logographies such as Chinese and Japanese Kan j i transcribe words. It could be 
argued that these differences have consequences on the importance of phonetic 
representation to children's initial acquisition of word decoding skills. The 
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beginning reader of the alphabet, for example, who is attempting to recognize 
a written word- like "kitten,"' must be able to integrate the phonemes that the 
letters represent, and this may require effective use of phonetic repr^senta- 
t>on.' For the reader of a syllabary, integrating a sequence of sylla>4W?like 
"neko" into a word may also involve phonetic representation, bu^the demand 
could be milder than that of the alphabet, insofar as syllables)are^ less ab- 
stract phonological"units than phonemes, and there are typically fewer of them 
in a word. Finally, for the reader of a logography, word decoding may place 
almost no demand on phonetic representation. Since the characters of 
logographies transcribe words on a one-to-one basis, recognizing a word could, 
be an all-or-none process that does not require that phonemes or syllables be 
retained in 'temporary memory. , 

• 

Thus it is an open question whether use of phonetic representation will 
distinguish good -and poor beginning readers of Japanese. Likewise, it is un- 
knc wn whether, like good and poor readers of the alphabet, - good and poor read- 
era of Kana and Kanji tend to possess equivalent nonlinguistic memory skills. 
Here it is conceivable that the acquisition of the Kana syllabaries and the 
Kanji logography could place a certain demand on visual memory systems. 
Syllabaries, to some extent, and logographies, in particular, involve consid- 
erably more orthographic units than the alphabet. Thus would-be readers of 
Japanese must encode and remember more visual shapes. For mature readers of 
Japanese, the ability to read Kana and Kanji, like that to read the alphabet, 
tends to be associated with the integrity of the linguistic faculties of the 
left hemisphere (Sasanuma, 1975; Sasanuma & Fujimura, 1971). Hence, it ap- 
pears unlikely that the visual memory demands of Kana and Kanji cause skilled 
readers of Japanese to place inordinate reliance on nonlinguistic memory 
skills. Nonetheless, for the beginning reader who is acquiring an initial 
knowledge about orthographic characters, it is possible that learning to 
remember more-or-less abstract patterns of lines and curves could demand ef- 
fective use of nonlinguistic memory skills. If so, good beginning readers of 
Japanese may surpass poor readers in their ability to hold certain types of 
nonlinguistic material in temporary memory. vj 

A rationale has been offered that the type of temporary memory abilities 
that are crucial to success at learning to read may depend on the type of 
orthography at hand. Effective use of phonetic representation could 
contribute to the attainment ot early reading skill among American children 
because it fulfills the temporary storage requirements of all language proo 5 ^ 
essing, or because it fulfills certain specific requirements of learning to 
decode a phonographic transcription. Likewise, effective visual r.emory 
skills, which bear little association to American children's skills in learn- 
ing to read, could be cf limited utility only to readers of alphabetic 
orthographies, or could be of limited utility to all beginning readers. In 
the two experimental studies that follow, an attempt was made to discern 
whether phonetic representation and various types of nonlinguistic memory 
.abilities distinguish good and poor beginning readers of Kana and Kanji. The 
design is prompted by a previous study of American children that used the 
recurring recognition paradigm of Kimura (1963) to assess good and poor read- 
er-o' ability to remember alphabetically written material, visual nonsense 
designs, and photographs of unfamiliar faces (Liberman et al., 1982). The 
first experiment extends this methodology to the use of spoken nonsense 
materials, in order to determine whether effective use of phonetic representa- 
tion distinguishes good and poor readers of Japanese in the second grade. The 
second experiment compares memory for the two types of orthographic material, 
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Kana and Kanji, ^i-th that for two types of nonlinguistic visual material, ab- 
stract designs and faces. 

r* 

Experiment 1 



r 



Methods 

Subjects . The subjects were second-grade children attending the primary 
school attached to Ochanomizu University in Tokyo, Japan. %11 available chil- 
dren participated, including 50 girls and 50 boys, of mean age 86.6 months 
(sd.-3.5 months). At the completion of the- study, each child was rated by his 
or her classroom teacher as either good, average, or poor in reading ability. 

Materials . The materials comprised 52 two-mora (i.e., disyllabic) 
pseudoword items that were phonologically plausible according to the intui- 
tions of five native speakers of Japanese (three member^s^of the staff of the 
Institute for Jogopedics and Phoniatrics, a Japanese linguist, and a teacher 
of Japanese). The pseudowords were so constructed that all Japanese conso- 
nants and vowels were represented in a variety of combinations (with the. 
exception of consonant-[ y] clusters as in [kyo]). Both V and CV mora occurred 
with the restriction that no mora occur* more than once in either initial or 
final position. ) 

^Memory for these materials was assessed according to the recurring 
recognition parkdigm of Kimura (1963). This required that 80 test items be 
constructed, each item selected from the poot of 52 items. Four items were 
repeated eight times each (the recurring items) and the remaining 48 were used 
once each (the nonrecurring items). In compiling the test, the 'items were di- 
vided Into eight sets of ten items each, with each set containing the four 
recurring items randomly interspersed with six of the nonrecurring items. . The 
first set of ten items constituted the inspection set, the 'remaining seven 
constituted the recognition set of 7® test J/tems. The test was administered 
by a male native speaker^ of Japanese who rfead each item aloud witrfTa flat 
intonation at a rate of one every five seconus. 

Procedure * 

,A1± children were tested in their classrooms while seated with their 
classmates in their normal seating arrangement. Testing was completed during 
a single ^session conducted in the early afternoon after school hours. The 
instructor told the children that he would read to them some words they had 
never heard before (i.e., nonsense words) and that their task was to listen to 
the initial l set of ten wqrds and to try to remember each of them; afterwards 
they would hear the test set of 70. words, and were to mark their responses on 
a sheet numbered from 1 to 70 with a single box next to each number. They 
were to- put an "0" in the corresponding box if the word had occurred in the 
initial set; otherwise they should put an "X" in the box. At this point, 
presentation of the teh in^lal items began, immediately followed by presenta- 
tion of the 70 teat items** To aid children in the correct use qf the response 
sheet, the instructor said the number of each test item prior to reading the 
item aloud. v v. 



12 j 



125 



Mann: Memory Skills and Reading in Japanese 



Results and Discussion 



r Following the methodology employed in the previous study of American 
children (Liberman et al., 1 982 ) ^ the data were analyzed^ in terms of the total 
number of correct responses, summing over the seven sets of items, and includ- 
ing both the number of correct recognitions of recurring Ltems and the number 
of Cbrrect rejections of nonrecurring items* To determine whether this data 
reduction procedure masked -any critical findings, the data obtained from one 
of the three Classrooms of phildren (n-33) wfi*e subjected to a more detailed 
analysis, ' That analysis indicates t*hat a consideration Qf early vs. late 
set^or vpfwrecognition vs. rejection would not alter the basic pattern of 
fin'dircgs and the interpretation of them* ■ * * » 

The main purpose of this experiment was to determine whether, like begin- 
ning readers of alphabetic orthographies, children who differ in ,the ability 
to read Japanese differ in memory for nonsense words. The standpoint from 
which I will aTtempt to answer this question is the teachers' ratings of the 
children's reading ability, a .measure that. does not separate skill in reading 
the Kana syllabaries from that i*n reading the Kanji' logography. Attention to 
the separate contribution of each skill will be a concern of Experiment 2; 
here, the analysis focuses on the question of whether reading ability, in gen- 
* eral, is relate*} to memory for spokeh nonsense words. 

The answer is affirmative: good readers surpassed poor readers in merqory 
for the nonsense wojrds, as the following analyses will show. On the average, 
performance was significantly better than t£e chance level of 50$ correct,' 
t(99)-40.0, £<.001, although childrea>dif fered considerably in the accuracy of 
their responses. The highest score was 65 items correct (out of* 70) the low- 
est was 36, and the mean was 57.0, or 81 1 correct. The 15 children whom their 
teachers deemed to be good readers achievefl a mean score of 62. k correct, 
which is significantly -better than the m^an score of^j45 correct achieved by 
the 10 children who were deemed to be poor readers, t(23)«6.l8, j><.001. • When 
tjie scores of all children (good, average, and poor readers) are considered, 
*^he relation between memory performance, and reading ability is positive, and a 
modest, but significant correlation is found,* r(1,00)«. 25, £<.006. 

The results of this experiment therefore suggest that children who are 
good beginning readers of Japanese, like those who are good^beginning readers 
of English (Byrne & Shea, 1979), tend to excel poor readers in temporary memo- 
ry for spoken nonsense words. " % The implication, is that children who al»e 
particularly successful readers of nonalphabetic orthographies tend^o make 
more effective use* of phonetic representation than children who read less 
well. Whether use of phonetic representation is of equal importance to 
successful acquisition of the Kana syllabary and the Kanji logography remains 
■ to be explored in the second experiment, along with the question of whether 
memory for nbnlingu istic material is related to successful acquisition of ei- 
ther Kana or Kanji. 



Methods 



Experiment 2 / 

f 



Subjects . The subjects of -ExDeriment 2 were the same 100 children who 
participated in Experiment 1 , 
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Materials , The materials included' four different types oT visual stimu- 
li: abstract designs, photographs of faces, Hiragana, and Kanji'. Memory -fof 
each type of stimulus was asse&sed separately, using tests modeled on Kimura's 
(1 963) recurring recagnition paradigm* A description of each of the materials 
follow^: 1) The -Kimura abstract designs wane used according to a modified 
test procedure by Liberman* et al. (1982). These consisted of 1 irregular, 
nonrepresentational line drawings. 2) The face materials were black and white 
photographs o. 1 male faces in half profile taken from a high school- yearbook. 
Half* were looking to the left, and half to the right. Preparation of the 
materials was as in Liberman et al. (1982). In order to minimize^distinguiih- 
ing details that might lend themselves to verbal labeling, none of the photo- 
graphs showed teeth, facial hair, eyeglasses,y or distinctive marks such as 
scars, etc. In addition, a uniform mask was applied to- each picture to cover 
hair . and background detail. 3) The Hiragana materials were meaningless^ 
phonetically plausible digraphs that combined two ^characters, One above the 
other, which had 'been photographed from a set of flash cards. Each digraph' 
v/as a transcription of one of the 52 basic items employed in Experiment 1 and 
in the test sequence, different stimuli recurred and a a different order of 
presentation was employed. 4) The Kanji materials were chosen from the ap- 
proved set of Kanji that children master in the first grade, ^he character? 
were- photographed from large-sized prototypes contained in a standard diction- 
ary. * v 

The preparation of the memory .test for each type of fnaterial was ^3 de- 
scribed *In Experiment 1. From each set^of 52. stimulus items, a test set of 80 
items- was constructed, with four of the items recurring eight timeSseacJr, and 
48 of the stimuli occurring once. The items were divided into efght^sets of 
ten; within ..each set, the four recurring items were interspersed with six 
nonrecurring ones. The^Tirst set of ten constituted the inspection set, and 
the remaining, seven suts contained the 70 test items 

Procedure 

. Experim$$t 2 was run togther with Experiment 1, in two sessions one week 
apart. *Th§ Kana digraphs and the faces were presented in the first session; 
the Kanji, and *the nonsense designs were presented in the second. All stimuli 
were projected on a large screen at 'the front^of the room by means of a Kodak 
carousel projector. The subtended angle was sufficiently great for easy visi- 
bility from all parts of the room. it - 

The procedure was analagous to tha£ in Experiment 1 and was the same*for 
all four types of materials. The instructor began each test by telling the 
child that some Kana (or Kanji, etc.) would be shown on the screen at the 
fnont of tfte class. The task was to look careful^ and to try to remember 
each item as it appeared. As ' in Experiment 1, the children were given a re- 
sponse sheet that conpU^ed boxes numbered from 1-70, and were told that they 
would use the sheet to mark, each new i^em with an f, X M and each previously seen 
item with an "0." The ten inspection items were then presented at a rate of 
approximately one every five seconds, followed by the 70 test items at the 
3ame rate of presentation. The instructor Said the number of each test item 
as it appeared on the screen, to insure ^that the children would not lose their 
place on the response sheet. 
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Results and Discussion * 

As in.; Experiment 1., the data were 'scored in terms of the total number of 
correct responses given to each type of material, and scores r were considered 
'in relation to children's designation as good, average, or below average in 
reading ability. ^Although .performance on the- various types of material - was 
always significantly better than chance (t(99)-29.0 for nonsense designs; 
t(99)-20.4 for faces; t(99)-50.5, for Kanj i ;. 1 t(99)«40.3 fpf Kana; all of 
.which are significant "at the £<.00V level), the" level of performance varied' 
among children and across the various, types of items. Average scores were 
highejgt for Kanji (67. 5t items, 96.5* correct) followed by Kana (62.7 items, 
89 .'SjtVf abstract figures (59*.0 items,, 84. 3% correct) and faces (57.0 items, 

The main purpose of' Experiment 2 was to determine the relation between 
reading ability, memory f or-nonlinguistic visual information, and memory > for 
Kana and Kanji. To«that end, the first analysis concentrated on comparing the 
mean performance of th$ 15 good and the 10 poof readers. This has the virtue 
of permitting a direct comparison of the present results with those obtained 
in Liberman et al^s (1982) study of American children. For convenience, the 
.American data appear in Figured so that they may be compared with tjfle present 
results, which appear in Figure 2. 

1 

Turning first to the orthographic materials, it can be seen on the right 
sltde of Figure 2 that the good readers of Japanese achieved superior scores' on 
both the Kana r t(23)-l8.7, £<.001, and Kanji material's, t(23)-6, 12, ^.001. 
Note also that, whereas the, good readers had achieved equivalent scores on the 
two types of orthographic material (£>.01), the poor readers were markedly 
worse on the Kana digraphs than on the Kanji, t(9)-7.l8, £<.01. « Consequently, 
the extent of difference between* children in the two reading groups is greater 
in the case of the Kana materials. 

•'As for the two types of nonlinguistic material, ■ the left side of Figure 2 
reveals that, unlike the beginning readers' of English, -whose data appear in 
Figure 1, the good readers of Japanese surpassed the poor readers in .their 
performance on the. abstract designs, t(23)-4.76, p<.01. Like beginning read- 
ers of English, however, the good readers of Japanese did" not significantly 
differ from the poor readers in memory* for faces. Although the 'good readers 
tended to achieve slightly higtier scores, the difference failed to reach sig- 

1 

nif icance, £>* 1 . . • *• 9 

Further analysis involved a series of Pearson product-moment correlation^ 
computed to assess the interrelations between children's performance on the 
various types of materials employed in Experiments 1 and 2, and the teachers' 
ratings of their reading ability.- That analysis revealed that performance on 
the Kana materials was significantly correlated with performance on the Kanji 
materials, r(100)» .19, £<.03, with the ter.hers' ratings, r(100)».32, £<.001, 
and*""* with »«i*£Qjrmance on the spoken utterances employed in Experiment 1, 
r(100) = .27,{ £<.0t;^Y Perfopmance on tne Kan J 1 materials was likewise correlated 
with the /eachers' Votings, r(1 00) = ,36, £<.01 , and performance on the spoken 
utterances of Experiment 1 , -r( 1 00 ) = .30, £<.01. Neither the correlation be- 
tween memory -for Kanji and teacher ratings, nor that with memory for spoken 
utterances was significantly di f ferent* from the^ correlations obtained in the 
case of the Kana materials. 
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Figure 1. Me^n percentage of correct responses made by good and poor readers 
of English on nonsense designs, American faces and printed nonsense 
syllables. 
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Kanji. 



123 



129 



Mann: Memory Skil-ls and Reading in Japanese 



There was one difference,' however, between memory for Kanji and that for 
Kana. 'Memory for Kanji 'significantly "correlated with memory for the nonsense 
design^, r(100)».j8", £< . 03 whereas memory for Kana did not (p_>.1). Thus," the 
differences between good and poor readers 1 performance on the abstract designs 
.would appfear to be more Wsely ass&ciated with their differences in memory 
for Kanji than with that foX/Kana. H 

Further analyses of » the data revealed that memory for the abstract 
designs was correlated with the teachers 1 ratings, r(100)».36, p_<.0U It was 
also correlated- with, memory for the faces, r(100)».26, p_<.01. Neither memory 
for Kana nor memory for Kanji correlated with memory for faces (p>.1), and all 
other correlations failed ' to reach significance at the .05 level of 
confidence. «^ , N ' 

Discussion. 

Different components, of memory may accomplish temporary memory for 
linguistic and nonlinguisiic material's.. As is evident from the present re- 
sults, and those obtained' previous studies- of American children (Liberman 
et al., 1982; Mann '& Liberman, 1 984) , children who possess ^superior skills in 
one domain need not- possess superior skills in another. Skill in Remembering 
faces, for example, may have little to do with Skill in remfembering printed or 
spoken words, which . agrees with what is .known about the memory skills of 
adults (Woodhead & Baddeley, 1 981 

f 

» The present study concerned the types of temporary memory skills that are- 
mosf pertinent to children's ability 'to- learn to read Japanese, a language 
whose orthography^comprises a syllabary (Hiragarta) and a logography (Kanji) 
instead of. an alphabet. The possibility that different memory skills might 
makg different contributions to early reading success follows from findings 
about American children learning to read an alphabetic- orthography (Liberman 
et al., 1982; Mann, 1984; Mann et al., 1980). Children who are good begin- 
ning readers of English tend to surpass poor beginning readers in use of 
phonetic representation (see Mann, 1984, for a review) . However , as document- 
ed in Figure 1, good beginning readers of fnglish do not surpass poor begin- 
ning readers in mempr'y for abstract visual designs or in memory for faces. 
The benefits of. superior use of- phonetic representation, and the relative 
neutrality of nonlingu^stic memory skills, could reflect- the narrow demands of 
learning to read a- phonetic transcription, o? the broader demands of learning 
l& read" any written representation of spoken language. It was to decide be- 
tween these alternatives that .the present study was conducted. It sought to 
gain the broader perspective - on reading acquisition that is available through 
study of children learning to read a syllabary and a logography instead of the 
alphabet. • 

.It has bean - claimed that, all children learn to read Japanese (except 
thosw markedly deficient in- intelligence), and that reading difficulty is a 
problem peculiar to Western children learning to read the alphabet (Makita, 
1968, 1 974).- -A recent study; however, offers evidence that reading 
disabilities "may occur, as often in Japan'as in America (Stevenson et al., 
V98*>). Thus it would appear that early reading difficulty can occur in 
syllabaries and logographies as well as in the alphabet. With this finding in 
mind,' the present study focused on a population of secondr-grade children whom 
their' teachers rated as- good, average, or poor in reading ability. 

i 
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Confirmation £hat'the chiloren ratted as good readers were truly .better 
readers than the children rated as poor readers can- be had from the data that 
appears on the right side of Figure' 2. The - children who were good readers re- 
membered both the Kana and Kantfi materials significantly better than the poor 

- readers, which would be consistent wittf the fact that they possess a superior 
ability to read each type of . oharanter. A further ^finding about the reading 
ability of children "in. each group i* evident in Figure 2. The poor readers 
encountered more difficulty *in rememBe'ring the Kana digraphs, whereas the good 
readers were equally accurate on the Kana and Kanji materials. Tpe fact that 
the Kana digraphs were nonsense materials could be the source'of the poor 
readers 1 problems with these materials, as*oould be the fact that children had 
nbt had any practice in. reading and 'remembering such materials. Alternative- 
ly, Kana. may. ha.ve~been 'problematic because learning to decode an orthography 
that transcribes the abstract, phonological subcomponents of 'words could be 
more demanding than adquirirjg one that transcribes language at the level of 
.the # wqrd. Future research is necessary to clarify the basis of poor readers 1 

. difficulties 'with the Kana digraphs. 

Let me now turn .to the question* of whether linguistic memory skills are 
related to success' at learning to read Kana and Kanji." The answer is 
affirmative^ children who differ in reading ability tend to differ in the 
ability to 1 remember spoken nonsense words as well as in the ability to 
remember Kana and Kanji. Moreover, their memory for the nonsense syllables- 
was equally' rejated to their memory for Kana arid Kanji, whrch implies that the 
importance of phonetic representation is riot limited to orthographies that in- 
voli^some type of phonological transcription. The implication, then, is that 
effective use of phonetic representation characterizes superior beginning 
v efeuWrs, * whether they are learning to read an alphabet, a syllabary, or a 
logography". . Reading success in all orthographies may be influenced by the 
ability to recode orthographic material into a phonetic representation, which 
is consistent with findings that mature readers do so whether the language 
they read employs an alphabet, or not (see, for example, Conrad, 1961; Levy, 
1977; Slowiaczek & Clifton, 1980; Tzeng et al., f977). * 



With . respect to the relation between memory for orthographic and 
nonlinguistic. visual materials, there are commonalities between beginning 
readers of English and Japanese, but the^e is also an interesting difference. 
The commonality is that, like American children, Japanese, children who "differ 
in reading ability tend not to di/fer in "memory for faces. Thus it cannot be 
concluded that good readers possess a superior memory,, in general. The furth- 
er implication is that at least one aspect of nojilinguistic memory skill* may 
not be particularly relevant to success in learning to* read any orthography. 
The difference between beginning, readers of Japanese and English is that the 
go6d readers of Japanese surpassed the poor ones in memory f6r the abstract 
designs. 

Two observations suggest a plausible explanation of the . or.thogra- 
phy-specific relationship between reading ability and. memory for abstract 
designs. The first is that', as can be seen'fronv a comparison of Figures 1 and 
2\ Japanese children tended to surpass American children in memory for the 
nonsense designs (a mean score 58 items correct, as compared to ^9 items cor-' 
rect, respectively). It may be the case that the Japanese children employed a 
% more effective strategy for remembering these materials. The second observa- 
tion concerns the nature of this hypothetical strategy.. During testing, I 
noted that, unlike American children, many Japanese children attempted to 
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trade the designs witi> their. fingers, or ^ even with a motion of their head. My 
hypothesis is that the y( were enc.TJd'ing '.the design into a graphomotor represen- 
tation, in much the* same way, that they, might encode an unfamiliar Kanji char- 
acter when the teachef first presents it. in class. A graphomotor coding 
strategy could be encoiiraged by the teacher's instructions in Kana* and. Kanji, 
as these always involvje 'presenting characters as a sequence of strokes that 
*the children must copy and* memorize., Applied to. the abstract designs, a 
graphomotor coding strfa'tjegy/ can *' explain, why memory for the Kanji' Correlated 
with that for .trte designs. CHqvev'er* one would .have to v explain the Ha5serns* of 
a correlation between memory fqr nonsense designs and 'that for Kana digraphs.) 
It could further 'account/ for the" correlation 'between the teachers' ratings of 
reading ability-, and performance on the -nonsense designs, finally, it can ac- 
count for vthe superior performance ■ of t^ie -Japanese children, , in generals the 
American children, by virtue} of th'eif' education, 'would have been less likely 
to make systematic use 61. a graphomotor codi,ng strategy as <a means of 
remembering the nonsense designs-,- and therefore would have been less success- 
ful, * 

*. ' ' 

\ If these* arguments are -acpepted, the implication is that goc^i beginning 
readers of Japanese,, in addition to< making more Effective use of phonetic 
representation, may also make more effective use of graphomotor represents* 

.tion. Further research is needed -to confirm the use of a -graphomotor strategy 
by the Japanese children, and whether graphomotor coding Continues to charac- 
terize skilled readers of Japanese beyond the early elementary grades. Find- 
ings that the mature heading of Kanji and K*na is, more disrupted by damage to 
the left, .language dominant, hemisphere than by damage to the right (Sasanuma, 
1975; Sasanuma & Fujimura, 1971) would seem inconsistent with a view that 
some nonlinguistic coding strategy ;is fundamental to skilled reading of 

•Japanese. -Perhaps - coding strategies 'change witrh age "and reading experience, 
or perhaps the Kimura figures v are "processed differently by all Japanese sub- 
jects. • . 1 
• • » ' ' v * . . 

To"' return 'to the major findings, of this study, two experiments have 
revealed 'Wiat, for second graders who are learning to read Japanese/ use of 
phonetic representation in temporary memory is pertinent to the ability to 
read well* and to the ability to remember Kana arid Kanji materials for a brie-f 
period of. time. Memory for Kana is related to memory for spoken nonsense 
words, -but not to memory for nonlinguistic materials such as faces and ab- 
stract designs, "in contrast, memory f or Kanj i is related not only to memory 
for -spoken nonsense words, but also to> memory ' for . nonsense designs, \ whfeh 
would seem to imply partial reliance, on a graphomotor coding strategy. It can 
be concluded that acquisition of both Kana and Kajjji, like that of the alpha- 
bet, 'makes demands on the linguistic component of temporary memory. Acquisi- 
tion of Kanji, however, contrasts with that of other orthographies, insofar as 
it make's an additional demand on a nonlinguistic component. .The outcome of 
having to master both Kana and Kanji is that, for the beginning reader in Ja- 
pan, both' phonetic arfd graphomotor- memory abilities are associated with early 
reading success. 

1 
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The act of reading involves the recognition of individual words and the 
integration of the meanjttgs of those words for text comprehension. The pre- 
sent paper reports ^en/studies with deaf, college students, focusing first on 
their word recognition processes' and then on their short-term memory represen- 
tation of those -words. < 

The approach of this paper will be to focus on analytic reading, in which 
the reader takes advantage of the linguistic information reflected in the 
orthography and performs a grammatical analysis on the word's of a sentence, 
thus leading to comprehension (Mattingly, 1980). While some have* taken the 
position that reading need not involve such linguistic mediation, there is a 
great deal of evidence in the literature indicating that such analytic proc- 
essing promotes acquisition of reading among beginning readers and facilitates 
reading (especially of difficult material) for more advanced readers (Gleitraan 
& Rozin, 1973; Liberman, 1983). Evidence of this sort provides the motiva- 
tion for focusing on analytic reading in tfte present, paper. 

v 

The orthography of English is analphabetic writing system that reflects 
the morphopnonemic. structure of ' the language (Chomsky & Halle, 1968; Klima, 
1972; Venezky, 1970). Hearing college students exploit this structure in the 
reading of words, even in the reading of those words that are familiar 
(Brooks, 1977; Massaro, Taylor, Venezky, Jastrzembski , & Lucas, 1980). Simi- 
larly, deaf college students are sensitive to orthographic structure (Hanson, 
1983; Hanson, Shankweiler, & Fischer, J983), and they take advantage of this 
structure to facilitate word recognition (Hanson, 1982b, 1983). For example, 
in a study in which deaf students were presented printed letter strings that 
were orthographically regular (e.g., REMOND, SIFLET) or orthographically 
irregular (e.g., RDEMNO, EFLSTI), these deaf students were found to recall 
letters of the orthographically regular strings more, accurately than those of 
the irregular strings (Hanson, 1983). Similar results were obtained in an 
experiment investigating the recognition of f ingerspelled words in which deaf 
adults were asked to report the letters of f ingerspelled strings that were 
orthographically regular (e.g., S-N-EH^-G-L-I-N) or orthographically irregular 
(e.g., F-T-E-R-N-A-P-S). They more accurately reported the letters of the 
regular strings (Hanson, 1982b). 

This superior performance *&n the regular strings suggests that skilled 
deaf readers, like skilled hearing readers, are sensitive to orthographic 
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structure. As further support for this suggestion, nearly all of the incor- 
rect letter reports in the experiment on f ingerspelling were found to be 
orthographicaliy permissible. Particularly striking was the finding that for 
the orthographicaliy irregular strings, many of the errors in reporting let- 
ters tended to result from the subjects' attempts to regularize the spelling 
of these strings. For example, in recalling the 3tring F-T-E-R-N-A-P-S, some 
subjects omitted the letter that made the sequence irregular and wrote 
fernaps . Others added a letter to . make the Sequence regular and wrote 
af ternaps , while others rearranged the letters of the sequence and wrote 

ferntaps . 

\ 

So far, this discussion has concentrated on 'reading at the level of the 
single word. But, in addition to the ability to deal with the structure of 
individual words, reading requires holding words and their order of arrival in 
memory long enough to permit sentence comprehension. Short-term memory stud- 
ies have been used to examine the nature of the internal representation (or 
code) used by deaf readers to mediate t*)is comprehension process. 

In studies of short-term memory with deaf college students, two primary 
findings have .emerged. The fir3t has been that these students, particularly 
the better readers, tend ,to use a speech^based code in the short-term reten- 
tion of printed English words (Hanson, 19&2a; Lichtenstein, in press). 'These 
results are consistent with Conrad's (1979) finding that the better deaf read- 
ers* among high school age students tend to use a speech-based code. These re- 
sults extend Conrad's work, however, in an important way: While the students 
tested by Conrad attended schools that were strictly oral in their educational 
approach, the college students tested in f hese more recent studies have had 
manual language experience, some even being native signers of American Sign 
Language. The second finding to emerge from the short-term memory studies 
with college students has been that deaf readers have difficulty in using a 
speech code. Even deaf readers who do use it, use it less efficiently than 
hearing readers (Hanson, 1982a; Lichtenstein, in press). 

Given this difficulty in using a speech-based code, why might the better 
adult readers tend to prefer it over a manual code? A partial answer to this 
question is suggested by research on the retention of serial order informa- 
tion. Since English is a language in 'which word order carries critical 
syntactic information, the retention of word order during sentence comprehen- 
sion is essential. In an experiment comparing the memory of- deaf ?and hearing 
college students for sequences of printed English words, deaf college, students 
had poorer recall only when they were required to recall the words in their 
order of occurrence; tne deaf students were comparable to the hearing stu- 
dents when recall of order was not required (Hanson, 1982a). Thus, the deaf 
students had specific difficulty in retaining information about the order in 
which words were presented. The extent of this difficulty appears to be 
related to deaf individuals' ability to use a speech code; in tests of 
short-term memory, those students with the larger memory spans have 3hown the 
greatest use of a speech code (Conrad, 1979; Hanson, f982a; Lichtenstein, in 
press). These results suggest that th6 retention of word order information 
depends on the ability to use a speech code. 

In summary, this research suggests that deaf college students are quite 
proficient at using the orthographic structure of word3 in word recognition, 
but that even these readers experience persistent difficulties* in the 
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short-term memory processes that mediate comprehension. The difficulties ap- 
pear related, at least in part, to inefficient use of a speech-based code.* 

The question remains as to the nature of the speech-based representation 
used by deaf readers and how this representation is developed. Deaf readers 
could acquire , information about a speech-based code from the orthography cr 
through speaking and lipreading. It may be the case that deaf readers 1 abili- 
ty to use some form of speech-based code is ,not well reflected in* the 
intelligibility ratings of their speech. These Intelligibility ratings are 
based on listeners 1 ability to understand the deafjf speakers 1 utterances, h©t 
on the cfeaf individuals 1 ability to utilize speech in reading. Further re- 
search ne^ds to be directed at determinin^gHWw an effective speech-based code 
might' be acquired by deaf individuals for the purpose of reading. 
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DISCOVERING MESSAGES IN THE MEDIUM: SPEECH PERCEPTION AND THE PRELINGUISTIC 
INFANT* 



Catherine T. Bestt 



To parents and those who work with infants, one of the most remarkable 
developments during the first year of life is t * rapid growth of vocal 
communication skills prior to language. b Initially, .,ie infant- communicates by 
-cries, but during the second half-year, the speechlike sounds of babbling 
emerge. By twelve months, bdSblins not only conveys feelings and' needs to 
others but may also exp^ss infants' observations of regularities in the 
events and objects of their world. 

The sounds that infants make are but one facet of their progress toward 
verbal communication. More hidden from our view, yet also important, is their 
perceptual ^asp of the speech around them. In this chapter on infant speech 
perception, speech will be considered as the medium through which language is 
expressed vocally, much like the sounding of musical instruments is the medium 
through which a symphony is expressed. Both language and symphony are 
structurally complex systems, with many levels of concurrent organization, 
which are reflected in the organization of the medium. Speech carries the 
multiple messages that can be conveyed verbally, and hence carries information 
about the complex structural organization of vocal communication, which v in~ 
eludes not only the structure of words and sentences but also broader asp€cts f 
such as stress, conversational rhythm, and voice characteristics (e.g., speak- 
er gender, age, identity, and emotional state). It also carries information 
about finer grained structures such as consonants, vowels, and syllables, and 
the vocal gestures that produce them. * 

A listener's knowledge about the organization of vocal communication sets 
limits on which messages can be recognized in speech. That is to say, one 
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must know or learn which structures to listen for within the wealth of infor- 
mation that the medium reflects about the events forming a vocalization (stat- 
ed in the spirit of E. J. Gibson, 1977, and J. J. Gibson, 1966). Although 
infants may recognize some aspects of nonlinguistic structures in vocal 
communication, they are limited in their recognition of language structure as 
such. To use any language, then, they must still discover the existence and 
meaning of many of the messages in speech, particularly those of words and 
phrases. But where do they start and how do they proceed in their discoveries 
during the first year? For-' iYifants who have not yet discovered the word, what 
messages are perceptually available to them in speech that might expedite that 
discovery? 

The central concern of this chapter is with "the nature of messages that 
prelinguistic infants may he'ar in human speech, at the level of the finer 
grained structures that we know as consonants <md vowels. This Issue has been 
addressed in the last twelve years of research on infant speech perception. 
Two basic themes about the information that infants perceive in speech have 
emerged. Both themes presume that some innate mechanism(s) of the auditory"* 
perceptual system fully account for infant speech perception, thus implying a- 
mechanistic view tjjat the young perceiver f s role in seeking information in 
speech is rather passive. According to the* first theme, infants possess 
species-specialized perceptual mechanisms that are tuned to linguistic con- 
trasts among phonemes, those individual consonants and vowels we adults often 
associate with letters or letter combinations in words. The other theme 
proposes^ that infant speech perception is shaped by the auditory system's re- 
sponse to acoustic components of the stimulus; that is, the perceptual proc- 
ess is stimulus-bound and intrinsically neutral with respect to speech versus 
other sounds. These neutral acoustic attributes include the bits of noise and 
frequency changes, interspersed with silent gaps and humming or buzzing, which 
comprise a physical description of the speech signal. 

It will be argued here that neither theme adequately explains how infants 
perceive the speech they normally hear during development. In their stead, 
the features of a third perspective based on ecological considerations (see 
ilso Fowler, Rubin, Remez, & Turvey, 1980; Summerfield, 1978) will be out- 
lined, which posits a more active, information-seeking role for the perceiver. 
This alternative view is that infants actively attend for information in the 
3peech medium about the natural forces that structured it, particularly how it 
was shaped by the human vocal tract. For the sake of simplicity, this 
perceptual focus on how speech is structured by its vocal source will be re- 
ferred to as speech source perception . This term is offered rather than 
artlculatory perception (see Studdert-Kennedy , 1981a; Summerfield, 1978), in 
order to encompass not only the articulatory gestures of the mouth and tongue 
but also the anatomical structure of theViuman vocal tr ict and its variations 
according to speaker characteristics (e.g., % sex, age, and emotional state). 
A3 will be argued, a theory focused op-~fche vocal tract sources of the speech 
medium 1 s acoustic structure has greater potential than the other two themes 
for explaining how and why infants might begin to develop language based on 
the speech they hear. In short, It would provide the infant a more direct 
avenue by which to discover and produce words. 

But how is thill vocal source information conveyed in Speech? Simply 
stated, for now, the acoustic properties of sounds are determined by the 
.structure and movements of/ the sound-making object, including the human^ vocal 
tract (Fant, 1960; Flanagan, 1973). Thus, speech carries information 'about 
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to perception. This view wi 
; recent research with adults 
specialization of the human 



vocal configurations and gestures (Cooper, 1981; Dudley, A 9^0; Paget, 1930); 
the speech source view propones that this vocal tract information is .available 

11 be explained^in more* derail, with support from 
and infants. It will also be suggested ^ that the 
left cerebral hemisphere for language reflects an 
attunement to detect information in speech about the articulatory gestures of 
the speaker's vocal tract. Vhe chapter concludes Joy noting that speech source 
perception is only one contribution to the infant's development of language. 
In order to discover words and develop language, infants must also* learn about 
the broader aspects of language from the natural context in which speech oc- 
curs. I 

/ 1 . Setting tto*<ontext 

1.1 Language and the Prelinguistic Inf/nt ^Jj 

Prelinguistic infants, by definition, do not yet produce true words; 
that is, their vocalizations apparently do not refer ta objects and events in 
the way that the wprds* of the adult language community do. It should be noted 
that it is not possible* to draw a sharp chronological division between 
prelinguistic and linguistic periods in the development of either speech 
production or speech perception. Generally, however, the first year of life 
is considered to be prelinguistic. * 

Two complementary and interdependent questions provide a guide for under- 
standing the prelinguistic antecedents of language development: What is vocal 
language that a prelinguistic infant may come to know it? and What is the 
prelinguistic infant that she or he may come to know vocal language? (adapted 
from McCulloch, 1965). The next few sections will focus on the former ques- 
tion, to frame the subsequent discussion of infant speech perception research. 
They will describe the basic characteristics of speech that are important for 
understanding the task facing a prelinguistic perceiver. Once the stage has 
been set, the thr*ee theoretical views about the way infants process speech 
will be described in greater depth. 

1 .2 What Is Vocal Language 

What type of information or messages does speech carry that prelinguistic 
infants might perceive? Prelinguistic infants do not yet produce words, nor 
do most infants under 9-10 months yet comprehend spoken words (e.g., Lenne- 
berg, 1967). Thus, we should not expect younger infants to perceive any 
information that is defined by - word meanings. Nevertheless, some coherent 
information in human speech must be available to prelinguistic infants, for 
they do eventually discover words. 

To discover words in the sp % eech directed toward -them, presumably infants 
would, in part, have to (a) disembed from continuous speech the recurring 
subpat terns that become fami 1 iar words; (b) recognize the in variance in the 
pattern of a word, across the variations in acoustic detail that occur when it 
is produced by different speakers or in different contexts; and (c) recognize 
the relevant differences that do specify meaningfully different patterns. And 
in order to produce^ftords, they would also have to recognize how to imitate or 
approximate subpa'^terns from a language-user 1 s speech, even though the acous- 
tic output of their own smaller and differently proportioned vocal tracts dif- 
fers substantially frojn that provided by their older models (Goldstein, 1979; 
Mberman, Harris, WofcTf, & Russell, 1971) 
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In language research with adults, phonemes (consonants and vowels) have 
often been considered to be the building blocks of words. According to that 
perspective, the achievement of the perceptual task? previously listed would 
seemingly be founded on perception at the phonemic level of speech. To date, 
most infant speech perception research has focused on phonemes or their combi- 
nations in synj^les, on the apparent assumption that perception of these sub- 
word units must be precursory to the perception of words. 

Language users easily recognize words and phonemes when l'istening to 
conversational speech. However, these recognitions are no small feat for 
j#relinguistic listeners, who lack a language system that could help them solve 
some apparent puzzles in adult speech perception. The source "of these puz- 
zles, which lies in the acoustic characteristics of speech, is discussed next. 
(For more extensive discussions of adult speech perception, see Fowler et al., 
1 980;\ Liberman, 1982; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 
1967; \ Pisoni, 1978). 

1.3 The Puzzles in the Acoustic Shape of Speech 

To imagine the infant's difficulty, recall listening to a stretch of 
conversation in an unfamiliar language. Foreign speech typically seems like a 
relatively continuous flow of sounds, in which the quality of some phonemes 
may be unfamiliar (e.g., the /r/ of French or Spanish), and the boundaries of 
individual words often may be undecipherable. This* can make it difficult for 
a listener to recognize a foreign word^uttered in different sentences and bv 
different people. The infant's problem is compounded because, in contrast to 
a language use, infants presumably do not know what words are, so this con- 
cept cannot guide their discovery of word boundaries in the flow of conversa- 
tional speech. Similarly, a language ^ser may have difficulty recognizing the 
precise qualities 'of an unfamiliar foreign phoneme when it is uttered in dif- 
ferent words and by different people. Yet, relative to manure language users, 
to infants all of the phonemes occurring even in their native language 
environment would be comparatively unfamiliar. Infants also presumably lack 
certain concepts about the linguistic. role of phonemes that may guide the lan- 
guage user's recognition of individual phonemes in conversational speech. 

t 

1.3.1 Acoustic continuities and discontlnultes in running speech . One 
reason a sentence spoken in an unfamiliar language sounds indivisible is that 
utterances in natural conversation are a fairly continuous stream of sound. 
This is partly attributable to the cohesive intonation, or pitch contour of 
the voice. But it results also from the vocal-tract movement trajectories 
that interconnect the adjacent words in sentences or phrases. In ^^hversa- 
tion, speakers rarely pause between words, instead usually moving in- connected 
fashion from one to the next, just as a runner usually adjusts to changes in 
terrain or direction without pausing between step cycles. Sometimes neighbor- 
ing words in informal speech even become contracted (e.g., "what arc you ..." 
becomes "wadaya ..." or "whatcha ..."). Thus, the raw acoustic properties of 
conversational speech^ do not always reveal clear boundaries between words. 

\ Conversely, the vocal tract can make other relatively rapid adjustments 
that 'do cause obvious acoustic discontinuities. These breaks can occur within 
as well as between words, however. For instance, in the word "so" there is a 
rather sudden change from the lack of vocal-cord vibration during the voice- 
less /s/ to the onset of vibration for the voiced sound /o/. This causes an 
acoustic break between the noiselike, aperiodic hiss of the /s/ and the voiced 
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acoustic periodicity of the vowel; The paradox is that a knowledgeable 
listener perceive^ these discon,tinuites as an integral part of a word, where 
appropriate, rather than as breaks between words v . 

The speecH properties Jusft % discussed caff be seen in ttae spectrogram in 
Figure 1. A spectrogram is one % way of visualizing the acoustic components of 
ifpeech. As indicated earlier, these acoustic , character! stales are determined 
*by f the structure and movements of the vocal tract (Fant. 1960; Flanagan, 
1973)- * 




V wwv 
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Figure 1. Spectrogram of the sentence "What are you paying to me, little 
girl?" spoken by a man to a young baby. 
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The spectrographs analysis shows the relative acoustic intensity (dark- 
ness level) of the frequency components in the. speech signal (ordinate) as 
the?y change over time (abscissa). The wide, horizontally varying bars of in- 
creased density within the dark vertical striations are called formants, the 
lowest being referred to as the first formant (F1), and they correspond to the 
time-varying resonant frequencies of various relatively hollow spaces or 
chambers in the vocal tract. w In the vowel "ee, rr «£or example* a small resonat- 
ing chamber is formed at the front of the moufyi between the edges of the 
tongue blade pressed against the upper teeth and the close approach of the 
soft palate to the base of the tongue, while A relatively large resonating^ 
chamber forma at the back of the mouth behind the base of the tongue. This 
results in a low-frequency Fl and a high-frequency second formant (F2). 
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The vertical striations in wh-ich the , f ©rmats appear represent the 
individual energy pulses emitted by each voeal-foid vibration. The more 
closely packed the striations are, the briefer the periods between pulses and 
, hence the higher the pitch of *the voice. In Figure 1, the man raised his 
voice pitch substantially for the word "saying," and then dropped it for "(to) 
me little," raising.it again toward- the efld of "girl. " This degree of pitch 
.movlulation is more exaggerated 'than normal, and Often occurs when parents talk 
playfully to their babies CKaye, 1980)% Thev dappled, nonstriated patches, rep- 
resent aperiodic acoustic jnoise produced by air turbulence at some point in 
the vocal tract as with the tongue-tip* constriction near the upper front teeth- 
£9C£the /s/ of "saying." ' 

1.3.2 Continuities and discontinuities among and within - phonemes . Since 
ififant speech research has foqused primarily on phonemes, the sentence in Fig- 
ure 1 is printed beneath the spectrogram for reference. The letters for the 
vowels &nd consonants are roughly lined up under the midpoint of . their portion 
of the acoustic signal. The match between ptionemS 'and acoustic infQrmation i'S 
only approximate, however 1 , because or inherent difficulties in determining the 
acoustic span of a phoneme. At ^irnes, ( discontinuities app^r to fall within 
rather than betveen phonemes, arf was true at bhe level of words in a sentence. 
For instance, the /t/ in "to" encompasses an aperiodic noise burst, the 
following brief 'period of breathy aspiration, and, the subsequent rapid transi- 
tions of the formant frequencies (i.e., the^jipglidB at onset of theMdWeat 
formant and downglldes in tlje pale higher forinants). Again, the knowledgeable 
listener may paradoxically perceive acdu^tic^iscontinuities as integral to a 
unitary phoneme. . ) ' ' 

... , y u 

r A 

Another 'difficulty wtoh matching phonemes to acoustic segments* derives 
from the temporal overlap/^f adjacent phonemes. Speakers* do not sifoply ■ f inish # 
one phoneme and, at precisely that time, begin th€^p$t. Interconnected tra-' 
Jectories characterize the neighborly relations not only of wbi^6»N^ufc *arso of 
vowels and consonants, foAexample, the trajectory in "me" from* the Slips being 
closed for /m/ to 'the lipk being open and 'tongue blad$ high^ for fee." TJie 
structure of the vocal HractSdoes not permit an instantaneous change from x>ne 
configuration to another , just\s a runnels leg cannot instantaneously change* 
from flexed to extended positioh. *in the sample sentence, not only* is a dis- 
crete boundary* misting between the words "to" and "me," but' there is none 
within "to" to define the end of /t/ and the beginning, of the vowel. Yet' a 
perceiver familiar With the language can identify individual phonemes as well 
as individual word3. 

Thfe trajectories between target vocal-tract .configurations, however, ac- 
count only partially for. the acoustic interconnection between phbnemes, 0f w 
ten, vocal-tract adjustments for- a phoneme begin one- to several segments ahead 
of It, or persist one to seveVal segments beyond. In other words, there is 
some coarticulation among nearby phonemes (e.g., Bell-Bert i & Harris, 1979; 
Fowler, 1980). While pronounciog^the /t/ in "too," a speaker usually Is' al- 
ready rounding his lips appropriate for the~ "oo." This lip-rounding is not a, 
standard property of /t/ — for the /t/ in "tee," the corners of the lips are 
instead pulled back slightly for the following "ee." 

1 .3.3 Phonetic context effects and acoustic variability . Coarticulation 
among phonemes causes the acoustic characteristics of any item to be 
assimilated to its neighbors. The articulatory difference between the two 
/t/'s results in. "too" beginning with a somewhat lower frequency noise burst 



144- 



143 



s 



. ■> Beat: Discovering Messages in the Medium 

* ' ' ' * 

than "tee." The paradox or puzzle is that t/ although the perceiver recog 
» an invariant identity for a vowel or consonant across various phonemic 
texts, there is no clearcut invariance in its raw acoustic properties. 

Movement trajectories also contribute to this ( acoustic variability prob- 
* lem.« Their shapes are determined* by the vocal configurations they 
interconnect and there are rarely definable, boundaries in conversational 
speech between a static configuration and a trajectory into or out of it. 
Figure 1 indicates that,' because of the difference in -the" surrounding pho- 
nemes, the first and the last /I/, in "little" are acoustically different, J>oth 
in the flanking, .formant trajectories and in the exact frequencies of the 
flatter formants midway through the "segment." 

1.3. 4' Vocal tract variations and perceptual normalization . Not 
illustrated in the figure is a broader problem of acoustic variability: the 

•acoustic contextual variation caused by different speakers. Of importance are 
the differences found between' males and females, or between children and 
adults. On the average, female vocal tracts are smaller than those of males, 
which blades , the acoustios of female speech toward higher frequencies in voice 
pitch ancKIji fownaht frequencies. More important, though, are the v age and 
gender differences in proportional relations among vocal-traot areas. " The ra- 
tio between the distance from the vocal cords to the base of the tongue, ver- 
sus the distance from the lips to the base of the .tongue, is greatest for 
tfdult males and smallest for young infants (Goldstein, 1979; Heberman, 

.flarris, Wolff, & Russell, 1971). ♦ 

Because fdrmant frequencies are determined by the sizes of the vocal 
resonating chambers, 1 these vocal-tract ratio differences cause age and get^Ser 
differences in the proportional relations among formant frequencies for a giv- 
en vowel. *It has been impossible thus far to derive a simple mathematical * 
formula for the formant frequency relations of a vowel produced by proportion- 
ally differing vocal tracts. In othe* words, there is no invariant acoustic 
description of formant frequency relations across men's, women's, and chil- 
dren's' utterances .of the vowel (Bernstein, 1981; Broad, 1981; Kent & Forrier, 
1979>. The' puzzle is that listeners, at least those, familiar* with the lan- 
guage, immediately hear the vowel's identity across a variety of vocal tracts 
differing in* size and proportion. This perception of constancy in the face of 
speaker-specific acoustic variations has been referred to as the vocal-tract 
normalization problem, ^ * * 

1.3.5 Summary . These ^coustic properties thus pose a number of 
difficulties for the perceptual capture of words or phonemes from conversa- 
tional speech, even for adults listening to their native language. These' 
difficulties can cause a sentence spoken in an unfamiliar language to sound 
like a rather undivided flow, when the listener is not prepared Ao handle 
them. However, when adults listen to their native language, they can identify 
discrete phonemes and words. Most likely, this is becaule they already know 
the phonemes and many of the words in their language, as well as the permissi- 
ble ways by which items of either type can combine. Infants, on the other 
hand t do not have this knowledge of language. Yet they must be able to "solv^ 
these puzzles" in (perceiving speech in order to Ultimately discover words, 
since the words directed to infants are usually embedded in phrases or sen- 
tences (Kaye, 1980) and are presented to them by'a variety of people in dif- 
ferent speech .contexts. What might infants perceive in speech, at the level 
of the phoneme, .that could help them to recognize discrete words within the 
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flow oX sound? For consideration of thi n question, the discussion now* turns 
to the infant speech perception literature. 

2. * The Foundations of Infant Speech Perception Research 

Research on infant speech perception has largely -been guided by two 
theoretical approaches to one overriding issue and its underlying assumptions, 
as indicated earlier. Although the questions and discission presented thus 
far have been orie*nted around the eventual discovery of words by infants who 
'initially have very limited knowledge about vocal communication, this has not 
been the major issue in research' and theory on iijfant speech perception. 
Instead, the primary theoretical ifocus has been oh* how infants solve the 
•.iefcoustic puzzles of phoneme perception (e.g., Jusczyk, 1 901 a, 1 981 tJ) . Its 
main underlying assumptions have been that (a> the .basic perceptual, unit in 
speech is phoneme-si.zed, (b) the speech percept derives from intraperceiver 
transformation(s) of the acoustic properties of the stimulus, and (c) the 
source of the transformation is an innat;e mechanism, of the auditory system. 

Much of the research has been generated by a v controversy over the nature 
of the transformations) and supporting mechanism, which can bfe traced to.a 
similar controversy in experimental work, on adult speech perception. On one 
side is the -phonetic interpretation of infant, speech perception, which posits 
that the perceptual mechanism is uniquely human by nature and differs qualita- 
tively from the 'means for percelv-ing other sounds. Proposals about* the exact 
properties of the specialized phonetic mechanism have ranged from a .comparator 
that matches incoming speech sounds to the neuromotor commands for 1 producing 
them (the motor theory of speech perception: Liberman et al., 1967)»to innate 
categories of linguistic features of phonemes 4 (e.g. , A Elmas,] Siqueland, Jus- 
ezyk, & Vigorito,. 1971) that may be mediated by i , innately tuned neural 
feature-detectors .(e.g., Cutting & Eimas,.-;1 975). » 

On the other side of the controversy is the psychoacoustic approach, 
which presumes the machinery for the speech-to-percept .transformation, to .be 
neither uniquely human nor limited to the perception, of speech. In the 
psychoacoustic view, the general organization' of the mammalian (or primate) 
auditory system yields an invariant stimulus-bound response whenever a given 
acoustic c %wperty occurs, regardless o,f the class of -sound (e.g., speech 
vs. noVispeech) to which the individual is* listening.^ , 

<- r - ./ 'j 

In the following summary and interpretation of Wi<j literature, it will be 
argued that the psychoacoustic-phonetic controversy in infant -speech percep- 
tion research is misguided. Bofch views are inadequate because^ they fail to 
consider the relation between infant and language. /Following .that review and 
discussion, a promising alternative theoretical perspective on.. infant . speech 
perception will be described: the ecologically motivated (e.g., .J.-:pibson, 
1966; Summerfield, 1978) speech source perception view outlined -earlier. But 
at this point, the issue that has guided existing infant speech perception. re- 
search, the p3ychoacoustic-pHonetic controversy, must be placed i(i proper his- 
torical perspective. 

/ 

2 . 1 The Empirical Beginnings 

Research on infant perception or phonemes began with two reports in 1971, 
both of which gave a phonetic interpretation to the underlying", processes. 
Each study employed a variant of the habituation paradigm to map the limits of 
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infants 1 phoneme categories via ^heir discriminations of syllable pairs 
differing in initial consonant. \One of . the studies 'found that *5- to 
6-month-olds can discriminate natural utterances of /ba/ and /ga/ (Moffitt, 
1971). The' consonants in the tested syllables arc* both voiced stop consonants 
»(along with /d/j the voiceless stops arte /p/*, /t/, and /k/), but they differ 
,in place of * art ioulation. Since the infants discriminated between consonants 
that differed solely in place of articulation, the author concluded that 
"linguistic-perceptual capacities are present during early life" (p. 717). 

The 'other study (Eim?s et al., 1971) has\eceived the preponderant atten- 
tion in subsequent research. In this study, 1-\and l*-month-olds were present- 
ed, with computer-synthesized versions of /ba/ and /pa/, which differ in the 
articulatory property called voice onset time (VQT). That is, in the voice- 
less /p/ of /pa/, the vocal cords begin vibrating later with respect to the 
lip-opening gesture^than is the case with the voiced /b/ of /ba/. Therefore, 
/p/ has a longer VOT than /b/. The acoustic consequences of articulatory 
differences in VOT are many (Lisker, 1978), but research attention has 
.primarily focused on the time difference between the consonantal noise burst 
and the onset of periodic voicing, referred to here as acoustic VOT. It is 
usually confounded with other acoustic differences between a voigedrvoiceless 
consonant pair. 

Computer synthesis was used in the Eiraas et al. study to produce a 
systematic series or continuum of syllables, which varied in equal-sized, steps 
along the acoustic VOT dimension. Such acoustically controlled continua usu- 
ally cannot be produced by a human speaker because of mechanical constraints 
on possible voqal tract movements (although in the case of stop voicing, human 
speakers can produce a range of different acoustic ,.VOT values). Adult 
listeners typically fail to hear the gradual steps of acoustic change along 
synthetic continua between two contrasting consonants. Instead, they identify 
all stimuli as exemplars of one or the other phoneme category, and a sharp 
boundary on the continuum separates the two perceptual* categories. Adult 
listeners also discriminate between acoustically different pairs of synthetio 
stimuli ifuch better when the members are from different phoneme categories 
than from the same category. This pattern of identification and discr imita- 
tion results has beery termed categorical perception (see Figure 2). It was 
originally taken as evidence for 'a specialized phonetic mode of perception, 
since the nonspeech cont4nua that had been similarly studied were perceived 
continuously (i.e., no clear labeling boundary or no clear performance peak in 
discrimination ability) rather than categorically (e.g., Liberman et al., 
1967; Repp, 1982). 

^ To determine whether ^Tn fan ts also perceive speech categorically, and by 
presumption phonetically, Eimas et al. (1971) tested their discrimination of 
synthetic ^syllable's that differed in acoustic VOT, but either did or did not 
differ according to American English /pa/ versus 7 /ba/. The within-category 
pairs differed by the same magnitude of acoustic VOT as did- the between-cate- 
gory pairs. • The infants discriminated the betweeri-category difference much 
better x than the within-category difference, in agreement « with the adult, 
phoneme boundary. This finding led the authors to conclude that infants have' 
an, innate capacity to perceive speech linguistically,, that is, in terms of 
adult phoneme categories. 

/ 
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Figure 2. A schematic diagram of ideal results of categorical perception 
tests. 
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These two early studies were scientifically intriguing, and encouraged a 
profusion of infant speech research that still continues. Some researchers 
essentially accepted the premise'that the means by which acoustic properties 
are translated to a percept is speech-specialized, that is, specifically 
phonetic in nature. They went on, to explore the range and nature of the 
phoneme contrasts that young infants discriminate. As the psychoacous- 
tic-phonetic controversy indicates, however, other researchers took issue with 
the phonetic perspective. \The studies of this latter group have attempted to 
show that categorical perception is due to general psychoacoustic mechanisms, 
which respond to particular acoustic attributes whether t^hey appear in speech 
or nonspeech sounds, but which occur most frequently in speech (e.g., Blum- 
stein, 1980; Stevens & Blumstein, 1978). 

2.2 . I qf ant Perception of Phoneme Contrasts 

Many of the early studies (employed synthetic syllable continua, since 
explaining categorical perception has been a key interest in the psychoacous- 
tic-phonetic controvert- According to the definition of categorical percep- 
tion percentage-correct discrimination of stimuli fromsa continuum must show 
a performance peak at the position of the category boundary for labeling of 
those stimuli '(see Figure 2) (Studdert-Kennedy , Liberraai\, Harris, & Cooper, 
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1970). Therefore, both identification and discrimination data are needed to 
assess categorical perception. However, because there is no currently accept- 
able test for infants' identification of sounds, the conclusions of the infant 
research are based on discrimination data only (see Jusczyk, 1981 a). Since 
the infant/discrimination data cannot b^ compared to infant identif ication«Ve- 
- sponses, it may be better to refer to' the peaks and troughs in their*>ilscrimi- 
natidn or synthetic speech continua by some other term, such as perceptual 
boundary effect (Kuhl, 1 981 b; Wood, 1976). 
< 

A number of other infant studies from the phonetic perspective have em- 
ployed natural rather than ^--synthe tic speech stimuli, thus focusing on 
discrimination of contrastive * spoken exemplars rather than on categorical 
discrimination. In both types of study, the discrimination data have been de- 
rived from either habituation tests or tests of generalization of conditioned 
operant responses, s and the subjects have usually been between 1 and 7 months 
of age (prior to the reported onset of word comprehension). The consonant 
contrasts that were tested have usually occurred in syllable-initial position, 
which makes phoneme discrimination in natural speech easier for young children 
than do rfSdial or final positions (Schvachkin, 1973). 

w 

The categorical discrimination studies have found that Infants show a 
boundary effect for a variety of synthetic consonant contrasts, with their 
discrimination peak usually occurring at or near the adult American English 
identification boundary • These contrasts include stop consonant voicing (spe- 
cially, /p/ vs. /b/, and /d/ vs. /t/) as cued by acoustic VOT (Eimas, 1975a; 
Eiznas et al., 1971 ; ♦ Streeter, 1976) or, by another naturally occurring cue, 
the extent of frequency change in the first formant (F1) transition 
(J. LV Miller & Eimas, .1981). 

Infants al*o show a boundary effect for place of articulation differences 
between stop c vnsonan'ts (/b/ vs. /d/ vs. /g/) (C. L. Miller, & Morse, 1976; 
C.'l. Miller, Morse, & Dorman, 1977; Morse, 1972; Williams & Bush, 1978), 
even when the consonants occur in the middle of vowel-consonant-vowel (VCV) 
syllables (Jusczyk & Thompson, 1978) or in the final position of VC or CVC 
syllables (Jusczyk, 1977). Infant boundary effects have also been found for 
place of articulation distinctions between liquid consonants (/r/ vs. /I/, Ei- 
ias, 1975b). There is an infant boundary effect for place of articulation 
differences among fricatives (voiceless: /f/ vs.. "th" as in "thanks"; 
voiced: /v/ vs. "th 11 as in lf that^), although the infants 1 discriminations may 
differ in some respects from adults' (Jusczyk, Murray, & Bayly, 1979), 
Infants also show a boundary effect for manner of articulation differences be- 
tween /b/-/w/ (same place of articulation, but stop vs. senQ^vowel manner: Ei- 
mas & Miller, 1 >60a; Hillenbrand, 'Minif ie, & Edwar*S7~T979 ) and between 
/b/-/m/, although they do not discriminate the latter contrast as categorical- 
ly as adults, that is, they show moderate discrimination of the within-cate- 
gory pairs (Eimas & Miller, 1980b). 

Adults perceive synthetic continua between vowel contrasts in a more 
continuous or less categorical fashion than consonants, unless the vowels are 
severely shortened 'in duration (e.g., Crowder, 1973; Liberman et al., 1967; 
Pfsoni, 1 973 • 1975). Infants show similar effects in perception of the 
"ee-ih" vowel distinction (Swoboda, Morse, & Leavitt, 1976; Swoboda et al., 
1978). 

I 
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1 Consistent with the boundary effect findings, infants discriminate a 
fairly wide array of the natural consonant and voWel contrasts in spoken En- 
glish. They discriminate natural- stop voicing distinctions (Trehub & Rabino- 
yitch, 1972), as well as some contrasts that children often do no't produce 
correctly until late in phonological development (3~5 years), and which often 
cause persistent articulatory difficulties: certain fricative place 
(/s/-"sh") and voicing (/s/-'/v/) contrasts (Eilers & Minifie, 1975; Eilers, 
Wilson, & Moore, 1977), the place contrast between the liquid consonants 
/w/-/r/ (Eilers, Oiler, & Gavin, 1978), and the consonant clusters /sl/-/spl/ 
(Morse, Eilers, & Gavin, 1982). Infants also discriminate the naturally pro- 
duced Vowels "ee» versus M ih" (Eilers et al., 1978), as well as the "ee-ah-oo" . 
triad, whether they occur in isolation or in CV syllables (Trehub, 1973). 

To summarize, • young prelinguistic infants discriminate' many consonant 
contrasts and Often show a boundary effect akin to the adult category bound- 
ary. They- even discriminate some consonant contrasts that children produce 
late and often misartlculate. Infants also discriminate vowel contrasts, and 
do so less categorically than consonants, again like adults. (For more 
extensive discussion of the theoretical particulars, see Aslin & Pisoni, 1970; 
Cutting & Eimas, N75i Eilers, 1980; Eilers & Gavin, 1981; Eimas, 1974a, 
1975a; Jusczyk, 1981a, 1981b; Kuhl, 1978, 1980, 1981a; Mehler & Bertonclnl, 
1978; Morse, 1978; Trehub, Bull, & Schneider, 1981; and Walley, Pisoni, & 
Aslin, 1981). * , 

The aerformance pattern does not, however, indicate whether infants 
discriminate by psychoacoustic or phonetic means. Initially, researchers who 
took a phonetic , view assumed that the boundary effect was evidence for a 
speech-specialized perceptual process. But that assumption could be ques- 
tioned, and was submitted to test by researchers on both sides of the theoret- 
ical dichotomy. The alternative posed by the psychoacoustic perspective was, 
pf course, that the perceptual boundary might be an attribute of the auditory 
system's response to the acoustic properties, rather than the phonemic 
identities, of the speech sounds. 

2.3 Is the Boundary Effect "Phonetic" or "Psychoacoustic" ? 

A direct test of this question was to see whether infants show a 
discrimination boundary effect for some phoneme-differentiating acoustic prop- 
erty even when it occurs outside a speech context. Both Morse (1972) and El- 
mas (1974b, 1975b) isolated the major acoustic cue that had been manipulated 
to produce a place of articulation continuum, by stripping away the other 
acoustic properties that were shared by the contrasting phonemes, and present- 
ed young infants with the isolated cue continuum to discriminate. Morse 
(1972) tested infants with the isolated F2 for the /ba/-/ga/ contrast, whereas 
Eimas tested the isolated F2 cue for /da/-/ga/ (1974b), and the isolated F3 
cue for /ra/-/la/ (1975b). Although in each case the isolated formants were 
the sole acoustic property that distinguished the phoneme contrast, and hence 
were crucial to adult categorical percept-ion of that contrast, outside of 
their natural context they sounded like nonspeech "bleats." 

The argumeAt was that if the infant boundary effect reflects a uniquely 
speech-related perceptual specialization, It should occur in the perception of 
a particular acoustic cue only when that cue actually specifies a phoneme 
distinction. The infants in the Morse and Eimas control studies did show a 
•boundary effect for the full syllables but failed to show on^ for the isolated 
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formants, and in fact discriminated the latter very poorly. The authors 
interpreted their findings as support for the phonetically specialized nature 
of infant boundary effects. • 

The psychoacoustic perspective offered an alternate' interpretation, how- 
ever. The crucial stimulus attribute for psychoacoustically based boundaries 
could be the interrelation between the distinctive acoustic cue and the 
nondistinctive information provided in the other formants (e.g., Jusczyk, 
1981a), Such a relational attribute would be destroyed by presenting the dis- 
tinctive cue in isolation. Therefore, the Morse and Eimas studies could not 
definitively answer the controversy*. 

A more appropriate nonspeech control should maintain the interrelations 
among the acoustic features involved in a phoneme distinction. One' such non- 
speech distinction is the difference in risetime, or time from onset of a 
sound until it reaches its maximum intensity, between plucked versus violin- 
like sounds, which Is analogous to the "sha-cha" distinction in speech. 
Adults had been reported to perceive the pluck-bow distinction categorically, 
even though it is nonspeech (Cutting & Rosner, 1 97 1 * ) • Juscjl^k, Rosner, Cut- 
ting, Foard, and Smith (1977) extended that finding to infants and presented 
it as evidence that infant perceptual boundary effects have a psychoacoustic 
basis. However, subsequent replications of the adult study uncovered a stimu- 
lus problem. The acoustic differences among the original pluck-bow stimuli 
were not of equal magnitude throughout the continuum; when this source of 
acoustic discontinuity was removed, adults no longer perceived pluck-bow 
categorically (Rosen & Howell, 1981). Since Jusczyk* et al. employed the 
original pluck-bow continuum, the infant findings must be questioned (Jusczyk, 
1981a, 1981b). 

A second infant nonspeech study was subsequently run, using tone onset 
time (TOT) differences between the individual tones of a two-tonfe cftord, which 
is an analogue for the acoustic VOT distinction in speech (Jusczyk, Pisoni, 
Walley, & Murray, 1980). Adults show a sharp TOT boundary in line with their 
boundary for acoustic VOT (Pisoni, 1977). Thus, the phonetic uniqueness of 
adult categorical perception has been called into question by the TOT results, 
along with similar reports on other nonspeech contrasts (e.g., J. D, Miller, 
Wier, Pastore, Kelly, J Dooling, 1976). In the Jusczyk et al. (1980) study, 
the infants discriminated TOT differences nearly categcjr ically, leading the 
authors to conclude that earlier reported VOT boundary effects may not be 
unique to speech perception by infants either. 

Nonetheless, the TOT findings also fail to offer a definitive choice be- 
tween the phonetic and the psychoacoustic interpretations of infant perceptual 
boundaries. In contrast to adults, the* infants failed to discriminate TOT as 
categorically as acoustic VOT, and the . osition of their TOT bounu-ry differed 
significantly from, their acoustic VOT boundary, Jusczyk et^ al. Ml 980) claim 
that this does not damage the general psychoacoustic stand^N^si^pe TOT only 
partly captures the artlculatory VOT distinction (i.e., the psychoacoustic key 
could be some other, untested acoustic attribute of artlculatory VOT), Howev- 
er, the developmental data are at odds with this logic. The TOT and acoustic 
VOT boundaries do match for adults, indicating that a perceptual change must 
occur between infancy and adulthood. Yet the infant and adult VOT boundaries 
match. Therefore, it is the TOT boundary that changes developmentally , and 
not the VOT boundary, in contradiction to the claim that infants come to per- 
ceive distinctions via psychoacoustic means (Jusczyk, 1981 b). 
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In any event, the evidence Troth the infant nonspeech perception research 
is weak. The currently available nonspeech control studies present a theoret- 
ical stalemate between the phonetic and psychoacoustic claims about the nature 
of infant speech perception. However, the proponents of the controversy have 
argued that the answer may lie elsewhere in the premises of the psychoacous- 
tic-phonetic distinction. It might be settled by exploring whether phonetic 
perception is uniquely human, a claim made by the phonetic perspective that is 
rejected by the psychoacoustic perspective, 

2.4 Is the Boundary Effect Un iquely Human ? 

^ Empirical attacks on the claim that humans are the sole possessors of 
categorical phoneme perception have, involved assessing whether other mammals 
or primates show abrupt shifts in perceptual sensitivity around the phoneme 
boundaries. It was reasoned that, if animals showed a boundary effect, gener- 
al psychoacoustic factors must then account for categoricity in speech percep- 
tion, since by definition infrahumans cannot perceive in a humanly specialized 
manner* The relevance of this research to infant speech perception is that 
infants and animals are nonusers of language, but only infants are human and 
have the capacity to develop human language. 

Researchers of animal speech perception have reported boundary effects, 
similar to those found with infants, for discrimination of the stop consonant 
voicing distinction by chinchillas (South American rodent) (Kuhl, 1981 b; Kuhl 
& Miller, 1975; J. D. Miller, Henderson, Sullivan, & Rigden, 1978), and by 
rhesus monkeys (Waters & Wilson, 1 9? 6 ) and for the stop consonant place of ar- 
ticulation distinction by foonkeys (Morse & Snowden, 1975; Sinnott, Beecher, 
Moody, & Stebbins, 1976). Chinchillas also discriminate the vowels "ah 11 and 
fl ee M (Burdick & Miller, 1975); as do dogs (Baru, 1975), and exhibit boundary 
effects in go- no-go categorizations of voicing among stop consonants, with 
-fcfieir boundaries falling at the position of human adult boundaries (Kuhl & 
Miller, 1978). Thus, the psychoacoustic interpretation is that the 
perceiver's knowledge afNlanguage is not a necessary precondition for the 
boundary effect, and apparently neither Is membership in the human species. 

Researchers whc support the psychoacoustic view of speech perception have 
used the animal findings to propose the following picture of the evolution of 
speech perception and production: the mammalian auditory system has special- 
ized notches in sensitivity for certain regions of certain acoustic dimen- 
sions. These psychoacoustic specializations placed selective pressures on the 
choice of phoneme contrasts by human languages. The production of the pho- 
nemes chosen must have capitalized on just the acoustic domains that are most 
neatly suited to mammalian psychoacoustic specializations ( Kuhl, 1 981 b; Kuhl 
& Miller, 197^>, 1978; J. D. Miller, 1977; Stevens, 1 972). By extension, hu- 
man infants possess those same psychoacoustic sensi t ivit ies (A si in & Pisoni f 
1980; Jusczyk, 1981a, 1981b; Kuhl, 1978; Walley, Pisoni, & Aslin t 1981), 
and their attention is thus captured by the acoustic attributes of the lan- 
guage in their environment. 

However, the claims of this psychoacoustic proposal belie the clarity of 
the infrahuman data. The animal ^ata fail in several ways to matoh those of 
human adults. Recall the earlier argument that a determination of categorical 
perception depends on data from labeling and from discrimination tests; the 
animal research has necessarily relied only on discrimination data. Indeed, 
animal discrimination boundaries are leas sharply defined than human adult 
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phoneme boundaries (Kuhl & Miller, 1978). In other words, animals are notice- 
ably better than human adults at discriminating wi thin-category acoustic 
differences in place of articulation continuum, and worse at discriminating 
between-category differences, indicating lowered categoricity (Morse & 
Snowden, 1975). The between-species categoricity difference is co Mstent 
with the greater sensitivity of human adults to formant-onset frequency 
changes, by a factbr of about two (Sinnott et al., r976). In addition, only 
human adults show reaction time increases, large ones in fact, when making 
within-category discriminations (Sinnott et al., 1976). Finally, the absolute 
position of the human adult category boundary is more stable than the monkey 
boundary, in the face of variations in the acoustic range covered r by a 
synthetic phoneme continuum (Waters & Wilson, 1976). Humans obviously ao* show 
speech-relevant perceptual specializations beyond the limits of the other 
mammals tested. 

The animal and nonspeech research may suggest that the boundary effect is 
not absolutely speech-specific and species-specific. However, there are clear 
and unexplained differences remaining between the. human adults' perception of 
speech and the control studies on animal speech perception and infant non-, . 
speech perception. Thus, the basic theoretical choice between the phonetic 
and psychoacoustic explanations of speech perception, especially in infants, 
is still open. If studies of the boundary effect have failed to 'solve the 
psychoacoustic-phojjetic quandary, then possibly a more abstract characteristic 
of speech perception would (such as the perceptual constancy of phoneme iden- 
tity across phonemically irrelevant acoustic variations). 

2 . 5 Phonemic Perceptual Constancy in Infants 

Recall the earlier discussion of some puzzles in the fit between speech 
acoustics and perception, particularly the lack of satisfactory acoustic 
descriptions for the invariant identity of a phoneme across different contexts 
of surrounding phonemes (the acoustic variability problem) or as uttered by 
differently proportioned vocal tracts (the normalization problem). In spite 
of these puzzles, adilts perceive the identity of a phoneme spoken in widely 
different words and by different vocal tracts with seeming immediacy and 
effortlessness. 



Tc see whether infants show a similar perceptual constancy, Fodor and 
colleagues (Fodor, Garrett, & Shapero, 1970; Fodor, Garrett, & Brill, 1975) 
, trained them to respond operantly to a pair of vowel-differing syllables that 
either began with the same constant (e.g., "pee'^'poo") or began with differ- 
ent constants (e.g. , "pee'^'kah") . In both conditions, the consonants dif- 
fered acoustically because of the change in^vowel context. The authors wanted 
to assess whether, despite the acoustic variability, the infants learned the 
operant response more easily for the consonantal match. The infants were 
Later tested on a new syllable (e.g., "pah"), to determine whether they gener- 
alised the learned operant response more consistently from the conso- 
nant-matched pair to a new syllable beginning with the same consonant. The 
infrintn did learn and generalize more consistently for consonantal matches 
than consonantal mismatches. If they had learned to associate syllable pairs 
simply by remembering the pairing of their dissimilar acousti'c properties, 
they should have responded to the mismatehed-consonant pairs as consistently 
as to the consonant-matched pairs, The author's concluded that these 
pre I i rirfu i r>t ie infants had maintained perceptual constancy for consonantal 
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identity in the face of concurrent acoustic variations, and that this ability 
must depend "on innately determined phonological identities-* 1 --^ 180). 

There are two problems with this claim. Their task wa3 difficult for 
infants to learn, and several attempted replications or extensions have 
failed. In addition, the psychoacoustic perspective offered an alternative 
interpretation: The perceptual constancy might reflect a response to some 
(yet uncovered) higher order acoustic invariant shared by the varying in- 
stances of a given consonant (Kuhl, 1980, 1981b). Using a different technique 
than Fodpr et al., Kuhl and her colleagues found perceptual constancy in young 
infants 'for the vowels "ah" versus "ee" (Kuhl, 1979) and for the fricatives 
/f/ versiis /s/ (Holmberg, Morgan, & Kuhl, 1977) across different neighboring 
phoneme contexts and different speakers. Thus, consistent with the Fodor et 
al. report, the infants solved the acoustic invariance problem. Since 
perceptual constancy was maintained across different speakers the infants also 
solved the normalization problem. But whereas Fodor et al. favored the 
phorfetic viewpoint, Kuhl and others favor 'the psychoacoustic viewpoint (e.g., 
Jusczyk, 1 981 b ; Walley' et al., 1981), in part because chinchillas and dogs 
show perceptual constancy for "ah" versus "ee" across speakers and pitch 
contours (Baru, 1975: Burdick & Miller, 1975). Chinchillas also show such 
constancy for /t/-/d/ even in different vowel contexts (Kuhl & Miller, 1975). 

The perceptual constancy findings thus indicate that infants can somehow 
solve two seemingly knotty acoustic puzzles to reach an important perceptual 
aspect of phoneme identity. Once again, the findings apparently do not allow 
a theoretical choice between the phonetic ano^sychoacoustic explanations of 
infant speech perception. Also, as will, be argued in the next section, nei- 
ther is the theoretical choice decided by considerations about the innateness 
of infant phoneme perception. 

2.6 Innatenes3 of Infant Phonemig Perception Effects 

A pervasive notion on both sides of the psychoacoustic-phonetic contro- 
versy has been that a boundary effect or perceptual constancy effect is innate 
If infants show it "at the earliest age tested" (e.g., Aslin & Pisoni, 1980; 
Eitnas, 1975a; Eimas et al., 1971; Jusczyk, 1981 a, " 1981 bi Kuhl, 1978). 
Curiously, the empirical foundation for this belief includes almost no data 
before 1 month, a handful of studies on 2 -month-olds, anu many studies that 
have collapsed data" across 1-4 months, 4-6 months, 6-8 months, or 10-12 
months. Few have compared different ages groups (cf. Best, Hoffman, '& Glan- 
ville, 1982; Eilers, Wilson, & Moore, 1977; Werker, 1983; Werker & Tees, 
1982). 

. i 

This view apparently assumes that "an infant is an infant is an infant," 
across at least the first 6 months of life. In studies that averaged' over 
several months of age, the "earliest age" cannot be trusted since it refers to 
the youngest infant they tested, even though group data were reported and ap- 
propriate age analyses were almost neve? run (but see 1- versus 4-month age 
differences in Eimas et al., 1971). Thus, it is nearly impossible to assess 
which, if any, perceptual boundaries are innate, or presumably biologically 
determined and inborn. All we can note are the ages below which a given 
perceptual effect has not^ been shown; for this review, the conservative 
assumption will be that the average age tested is the correct "earliest age" 
to show the reported effect. 
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The literature offers the following ^observations : First, infant boundary 
effect^ matching the adult findings have been shown only by a mean of 2£ 
months (e.g., Eimas, 1974b, 1 975b; 'Morse, 1 972 ) • Conversely, newborns and 
1 -month-olds do not discriminate VOT absolutely categorically (Jusczyk et al., 
1979) nor do infants under 3-4 months discriminate phoneme contrasts under de- 
mands on short-term memory (Best et al., 1982; Morse, 1978). These data hint 
at a perceptual change sometlnfe" between 1 and 2£ months, which is consistent 
with widespread biobehavioral and social changes around 6-10 weeks (e.g., 
Clifton, Morrongiello, Kulig, & Dowd, 1981; Emde & Robinson, 1979; Haith, 
1979). These biobehavioral changes at 6-10 weeks include vocal behavior (Cll- 
er, 1980; Stark^ 1980), suggesting that early changes in speech perception 
should be further explored (see Werker, 1983, for an example of important age 
changes in speech perception by older infants). 

' ■ $ 

When infant and adylt categorical discrimination .has been directly com- 
pared, 3-month-olds ? (Eimas & Miller, 1980b) and even 7- to 8-month-olds f per- 
formance differs significantly from that of adults (Aslin, Pisoni, Hennessy, & 
Percy; 1981; Eilers, Wilson, & Moore, 1976). Differences from adults, in 
fact, persist until at least 5-6 years of age (L. E. 'Bernstein, 1979; Garni- 
ca, 1973; Robson, Morrongiello, Best, & Clifton, 1982; Schvachkin, 1973; 
Simon & Fourcin, 1978; Werker & Tees, 1981; Zlatin & Kcenigsknecht, 1976). 
Therefore, boundary effects cannot be considered innate in some absolute 
sense. 

Second, perceptual constancy for vowels is only certain as early as a 
mean of 2\ months (Kuhl & Miller, 1975). Perceptual constancy for consonants 
has not been reported earlier than 4 months (Fodor et al., 1975) or 6 months 
(Holmberg et al, 9 1977; Kuhl, 1980). Thus, arguments for the innateness of 
perceptual constancy effects (e.g., Jusczyk, 1 981 b) should also be held in 
check. 

The exact timing, causes, and nature (i.e., phonetic vs. psychoacoustic) 
of changes in speech perception cannot be inferred from this literature, how- 
ever. Even if age had been systematically studied, conclusions would still be 
limited by' the near-exclusive reliance on discrimination measures. The 
coincidence of an adult phoneme boundary and a peak in infant discrimination 
is not sufficient evidence to claim "adultlike" perception ' of the contrast. 
As argued earlier, assessment of categorical perception requires both labeling 
and discrimination tests. Simple discrimination cannot reveal whether the 
distinction was perceived as a phoneme contrast (see also Jenkins, 1980), a 
concern that is equally relevant to the animal research. Discrimination indi- 
cates only that some difference was detected. Since phoneme discrimination is 
dissociable from phoneme category identification in aphasics (Blumstein, Coop- 
er, Zur if , & Caramazza, 1 977 s Riedel , 1 981 ) , it is equivocally involved in 
aspects of perception that are closer to the meaning of language than mere 
acoust ic contrast detection • Because language -dependent perceptual qual i t ies 
are more likely to change developmental ly , discrimination is inadequate as the 
sole measure of development in speech perception. 

It is uncertain which, if an: speech perception effects are innate, and 
how they might change deveJopmentaUy prior to the discovery of words. More 
erji i.il for the" auditory-phonetic controversy, the following questions remain 
i]rrinMW'>rod : Even if perception of a phoneme contrast is innate, is that 
ip.n.it>?ly possessed quality phonetic or psychoacoustic in nature? And if 
pi»r'.*-pttM I change does occur during prel inguist ic infancy, what is the nature 
•»f the ehange? Each side o: the controversy has provided answers (see Aslin & 
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Pisoni, 1980; Trehub,, Bull, & Schneider, 1981; Walley, Pisoni, & Aslin, 
1982; cf. Eilers, 1980; Filers, Gavin, & Wi'lspn, 1979; Eimas, 1975a), and 
the auditory-phonetic choice remains unclear. One potential motive force for 
early perceptual changes can be eliminated, however: if they do occur, they 
could not have a .linguistic motivation from the infant f s perspective-. This 
fact causes difficulty for bqth sides of the controversy, as the next section 
indicates, 

S 

The data on categorical perception of nonspeech, animal perception of 
phoneme contrasts, perceptual constancy for phonemes, and innateness of infant 
phoneme perception have thus failed to decide between the psycht>acoustic and 
the phonetic explanations of infant speech perception. When an impasae as 
extensive as this is reached, it is important to consider whether the diffi- 
culty is not with the research but rather with the logic of the two theoreti- 
cal views themselves, ; 

3, Questioning the Auditory-Phonetic Question 

As stated earlier, the tacit assumptions shared by the two sides of the 
dichotomy have generally been (a) that the units of speech perception are pho- 
nemes (cf. Bertoncini & Mehler, 1981; Jusczyk, 1981a); (b) that some 
intraperceiver interpretive process must transform acoustic . properties to 
phonemic percepts; and (c) that the process is mediated by some specialized 
neural mechanism(s). All three assumptions can re questioned, particularly in 
relation to the infant f s discovery of, words. The first assumption requires 
that infants segment phonemes from connected "speech. Phonemic segmentation 
has not been assessed in infants, and is not straightforward even when the 
perceiver _is a language-user, since young children seem unable to explicitly 
segment phonemes (E. J. Gibson & Levin, 1975), as are illiterate , adults 
(Morals, Cary, Alegria, & Bertelson, 1979). The notion that infants perceive 
phonemes can also be questioned on a linguistic level (see discussion in the 
next paragraph). The second assumption entails that the listener shed meaning 
on the presumed meaninglessness of the superficial acoustics of speech, that 
is, the meaning of the stimulus resides solely in the listener and not 
directly in the signal. According to the third assumption, the transformation 
is accomplished by specialized nervous system structures or information-proc- 
essing stages. Thus, the psyuhoacoust ic and the phonetic views regard speech 
perception as a mechanistic intraperceiver process. 

«*- 

We turn now to a more detailed examination of each position in the 
psychoacoustic-phonet ic controversy. The main problem with the phonetic view 
of infant speech perception is that infants presumably do not have a language 
system, having not yet discovered words. Phoneme contrasts cannot be 
perceptually available to infants, since they are defined by a language sys- 
tem, tfeing dependent on word meanings in that system. They represent abstract 
relations among speech sounds that are used by the language to convey semantic 
differences, as in "pat" versus "bat." Although infants, like adults, 
categorically distinguish between /p/ and /b/ (e.g., Eimas et al., 1971), they 
do not necessarily perceive such differences as phonemic contrasts (recall 
that discrimination is an equivocal measure of~ lfaoneme perception). 

The phonetic view also has difficulty accounting for how and why infants 
would adjust their perceptual categories to suit the language of their 
environment. Presumably, the evolutionary advantage of innate phoneme cate- 
gories is that they would filter out irrelevant wi thin-category acoustic 
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variations and thereby relieve the perceiver of having to deal with those un- 
necessary details. . Yet young infants fail to discriminate some phoneme con- 
trasts existing in certain languages according to the adult categories „ How 
and why would infants learning those languages later become able to focus on 
those innately f iltered-out details, in order to adjust their category bound- 
aries or develop new categories? According to researchers on genetic evoli*- 
tion (Jacob, 1977),, on nervous system function' (Rose, 1976), on perceptual 
development (Spelke, 1979; Trevarthen, 1979),- and on speech perception (Stud- 
dert-Kennedy, 1981b), the most efficient evolutionary solution for developmen- 
tal adaptation to stimulus environments, such as that provided by the native 
language, is not an -array of innate mechanisms that are tightly tuned to 
specific 'stimulus values, but instead a more flexible attuneraent to detect the 
range ofXstimulus Values that could opcur. Thutf, any specialization we *\ave 
f~for perceiving speech would have to be sufficiently flexible to adapt to thje 

[ ^ep-fMc phoneme contrasts of, one's own particular language. 
^* « 

A major drawback of the psychoacoustic view. is the argument that the ob- 
ject of speech* perception is intrinsically meaningless speech-neutral acoustic 
data.* In particular, this claim has difficulty accounting for *the perceived 
constancy of phonemes spoken by different vocal tracts (vocal-tract normaliza- 
tion) and spoken in different contexts of surrounding phonemes (acoustic 
variability). The ability to re6ognize the invar iance "of a phoneme or word 
spoken in different contexts and by different people is crucial for lan- 
guage-learning infants. The phonemes they hear occur in a variety of phonemic 
contexts; the vocal tracts of the older speakers they hear (and must eventu- 
ally base their owi> vocalizations on) differ proportionally from each other, 
as well as from the 4 infants own vocal tract. Vocal- tract normalization and 
acoustic variability do not anpSar to cause perceptual difficulties for 
infants. Both infants and othprNanimals apparently solve the normalization 
and acoustic variability pnoolems in their discriminations among Dhonemes 
(e.g., Baru, 1975; Kuhl, 1979, 1980, 1981 b; Lieberman, • The 

psychoacoustic view does not adequately explain the infant's perceptual solu- 
tion to those problems. The nontrivial acoustic variations involved have thus 
far defied speaker-independent and speech-neutral acoustic definitions for ei- 
ther phonemes or words. Thus, it cannot be assumed that the infant f s solution 
focuses on speech-neutral acoustic information. 

A problematic implication of the psychoacoustic view is that the infant 
must at some time move from perceiving speech in purely auditory terms to 
perceiving linguistic structures, such as phonemes (Jusczyk, 1981a, 1981 b). 
What would lead the infant to take th£ cognit ive step from meaningless 
acoustics to meaning at the level of either phonemes or words? One proposal 
would be the empiricist philosophical perspective that infants learn meanings 
by contextual assoc iat ion. However, the associationis t solution is 
unsatisfactory on logical grounds (e.g., E. J. Gibson, 1977; J. J. Gibson, 
1966; J. J. Gibson & Gibson, 1955; Jenkins, 197*0. Meaning - cannot emerge 
f'-om meaningie3sness f so some meaningful element would have to predate the 
infant's first association, 1 The traditional empiricist perspective is that 
the elements of meaning are extrinsic to the individual, introduced by sensory 
stimulation. However, the psychoacoustic view assumes that meaning is not 
intrinsic to the speech stimulus, and it has provided no argument that other 
stimulation is intrinsically meaningful. In other words, the psychoacoustic 
view gives us no reason to believe that sensory stimulation in any modality 
provides extrinsic elements of meaning. 
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If meaning cannot be assumed to derive from extrinsic stimulation, the 
^traditional nativist perspective offers the alternative proposal that elements 
of/Mneaning are innate or intrinsic to the infant. The prime candidates for 
innate elements of meaning in infant speech perception, of course, would be 
phonemes or phoneme"- contrasts. This is exactly the position of the phonetic ^ 
viewpoint questioned previously^ and is antithetical to the psychoacoustic 
viewpoint because it cannot be invoked for animal speech perception. A third 
possibility is the constructivist perspective that the infant cognitively 
constructs meaning for sensory stimulation that does' not itself provide in- 
trinsic meaning. But this would still depend on some mechanism that deter- 
mined the nature of the, meaning to be constructed, and again thennost likely ^ 
mechanism for 3peech would be phoneme-based. 

A fourth, nontraditional perspective on the spurce of meaning in speech 
can be offered, however, which is not consistent tfith either the psychoacous- 
tic or the phonetic viewpoints. This is the ecological perspective (e.g.,' 
\J. J. Gibson, 1966) that meaning is directly available to perception in the i. 
active, adaptive relation between thfi'vperceiver and the objects/eyipnts being 
perceived. It will be discussed ^t greater length in the subseo^pit sections 
of the chapter. \ 

The psychoacoustic position may also be troubled by its ^^yolu^idnary pro- 
posal.' It assumes that specialized perceptual "notches" in theNserte^ikvity of 
"the mammalian auditory system along certain phonemically relevantVacoustic di- 
mensions have imposed selective pressures on the phonemes that can be uttered 
by the human vocal tract (e.g., Aslin & Pisoni, 1980; Kuhl, 1981b; 
J. D. Miller, 1977; Stevens/ 1972; Walley et al., 1981). An alternative > <^ 
proposal is that the anatomy of the human vocal tract places quantal limits on , 
the sounds it can make (see Stevens, 1972); it is this fact that has placed 
selective pressures on the evolution of specialized notches in the -auditory 
system. ' 

Specializations of neural .tissue, like any structural specializations, 
are naturally selected (evolve) because they have suited some purpose. Yet 
the psychoacoustic model fails to specify a purpose that could have selected 
for the speech-related perceptual notches or discontinuities in the mammalian 
auditory system. Certain basic properties of the auditory system probably are 
shared by all^mammals, reflecting selective adaptation to the commonalities in 
their auditory^eQvironments such as the sounds of weather, vegetation, preda- 
tors, and prey. HSwever, individual mammalian species do develop more highly 
specialized sensitivities for certain acoustic properties that are uniquely 
suited to the particulars of their own ecological niche (e.g., the bat; 
Neuweiler, Bruns, & Schuller, 1980). For some mammals, notably humans, 
species-specific vocalizations are particularly important to the species' 
survival, and have probably placed selective pressures on the development of 
specialized responsivit ies of the auditory system. In these species, we would 
expect to find that specializations in auditory sensitivity have evolved to be 
uniquely responsive to the vocal characteristics of the species (see also 
Petersen, 1981 ; Zoloth, Petersen, Beecher, Green, Marler, Moody, & Stebbins, 
19?°h vhich is the converse of the psychoacoustic model of evolution. In the 
case \f speech perception by humans and animals, recall that the specialized 
notches or dts^CDnt inuit ies do indeed appear to be more sensitive and finely 
tuned in human adults than in tHe animals studied (see Section 2.4). 
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The gerieral principle in evolution and in ontogeny of the nervous system 
has been that motor functions (e.g., vocalization) precede and often motivate 
the development of the correlated sensory functions (Bekoff, 1981; Horridge, 
1968). Consistent with that principle, motor arenas develop in advance of sen- 
sory areas at each level of the neuraxis (Jacobson, 1978), including the human 
neocortex (Marshall, 1968; Tuchmann-Duplessis, Aroux, & Haegel, 1975). Even 
more important, the motor-sensory precedence applies to the neural supports 
for human speech perception and production: the development and maturation of 
Broca's area (the motor speech cortex) precedes that for Wernicke-lsarea (the 
receptive speech cortex) (Rabincwicz, 1979). The more likely evolutionary 
scenario, then, may be that the hurflan auditory -system's properties Here 
-selected for best responsiveness to »ttfe sound-producing abilities of the 
uniquely human vocal tract (e.g., Stevens, 1972; Studdert-Kennedy, 1981 c), 
rather than vice versa as the psychoacoustic view suggests. 

In summary, both contemporary views of infant speech perception are 
flawed. The chapter's introduction pinpointed four knotty perceptual problems 
as requisites to discovering words in the speech^ stream: (a) disembedding 
words from connected speech; (b) recognizing word pattern invariances across 
different utterances and vocal tracts; (c) recognizing the variations that do 
specify different word patterns; and (d) hearing how to imitate a pattern 
made by another person. Neither the psychoacoustic nor the phonetic viewpoint 
offers the infant adequate means for, discovering words or phonemes in speech. 

But what is the alternative? The 'remaining sections on speech source 
perception focus on the organization of the speech medium for an answer. The 
vocal tract offers a structural and dynamic meaning that is intrinsic to 
speech itself, since as the source of speech it determines the shape of its 
acoustic product (e.g., Fant, 1973). It would be more parsimonious for 
perceivers to attend directly and actively to the available vocal-tract infor- 
mation that is intrinsic to speech, than to have the perceptual process 
mediated with a step involving the meaningful interpretation of meaningless 
superficial acoustics. According to the speech source view, meaning exia£js^_ 
perceiver's relation to speech at its source. This alternative approach de- 
rives from the general ecological approach to perception taken by James Gibson 
and his followers (e.g., J. J. Gibson, 1966; Fowler & Turvey, 1978; Stud- 
dert-Kennedy, 1981 ,- 1981 d; Summerfield, 1978; Verbrugge, Rakerd, ' Fitch, 
Tu^ter, & Fowler, in press). 

4. An Ecological Perspective on Infant Speech Perception 

Perception of the speech source implies that the infant attends to in- 
trinsically specified information about the vocal tract and the articulafejory 
events that shaped the speech medium. This premise is consistent with the 
ecological argument that perceiving organisms actively seek information about 
distal event3, which is lawfully specified in the stimulus array (J. J. Gib- 
son, 1966; E. J. Gibson, 1977). It stands in contrast to depictions of 
perception as the cognitive or neural transformation of proximal sensor\^iata, 
which are intrinsically meaningless and i nformat ionally impoverished with re- / 
spect to the distal event. The speech medium carries many parallel messages, 
r,ome of which are defined within a particular language, and thus presumably 
not detected by in-fants, who do not yet recognize that phoneiJes can ftinction 
to distinguish word meanings. Others are human universa/s, such as the 
paralinguLstic messages of emotional affect, of regional accent, and of age or 
gender effects on vocal-tract size and configuration. 
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The speech characteristics of interest for the current discussion are the ( 
structural* organization and articulatory gestures of the vocal tract. Accord- 
ing' to speech source perception, to perceive these characteristics is to 
simultaneously apprehend the constant anatomical structure and the transform-* 
ing positions of articulators in a speaking vocal tract (see also Schubert, 
1974). This proposition's supported by the speed and accuracy ■ with which 
adults and even young children *can imitate speech sounds, in spite of the 
"normalization problem" (e.g., Alekin, Klass, j^Chistovich, 1962; Ferguson & 
Farwell, 1975; Galunov & Chistov^ch, 1966; Kent & Forner, 1979; see Stud- 
dert-Kennedy^xi^ffla). However, the proposition may seem counterintuitive in 
two manners/' First, common sense suggests that we can see the structure and 
movements of objects, but that we hear only "sounds." Thus, the claim that we 
hear structure 'and movements, especially the small hldaen ones of vocal 
tracts! may seem unlikely. Second, it may seem implausible that structure aftd 
motionlare captured at once in .the same information, because the qualities of $ 
form 'and movement seem dissociable in our experience, However, an ecological 
appreciation of sound and hearing shows these "problems" to be false. 

H . 1 Ecological AcousticSi and Speech % \ 

Acoustic energy is the radiation over time and space of a wave of rapid 
alternations in air pressure. It originates from the oscillatory motion of ^ 
some object or surface that compresses and rarefies the distribution of air 
molecules around it. Both an object/surface and some oscillatory motion are 
necessary* to produce acoustic energy. In fact, their contributions to sound 
cannot be dissociated. Structural properties constrain the ' sqund-producing 
motions an object/surface can undertake; in turn those motions temporally 
deform the structure in a characteristic manner. It follows that the speciMc 
properties of an acoustic flow (e.g., time-varying frequencies and amplitude 
changes of the pressure wave) necessarily reflect those structural properties 
of the object and its vibratory movements 'that shaped the sound-production. 

By the ecological view, the interdependence between structure and 
transformation is at the core of real events and therefore of their perception 
(Shaw & Pittenger, 1977). The nature of an object is revealed to the 
perceiver through event-determined, co-defined information about structural 
invariants and transformational invariants in objects /events. These terms 
refer, respectively, to the structural identity of an object undergoing some 
transformation or change (e.g., by its movement or structural deformation, or 
by changes in the observer's orientation to it), and the transformations or 
changes it partakes in. Consider, for example, one person saying three dif- 
ferent words as opposed to three people saying a single word. In the first 
instance, information common to the three words' reflects the structural 
invariance of that speaker's vocal tract. In the second case, there is a 
transformational invariant in the articulation of the single word by three 
structurally different vocal tracts. Structural and transformational invari- 
ants lawfully shape the energy medium that carries their message (i.e., 
acoustical or optical energy). The ecological premise is that, through thei/* 
modulation of the energy medium, transformations can perceptually specif.y an 
object's structure. In support of this, ^fants' visual recognition of 
objects and their structure i3 enhanced by watching the objects undergo vari- 
es spatial- temporal transformations (E. J. Gibson, 1980; Ruff, 1980, 1982). • 
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From the ecological perspective, auditory perception is the 'co-detection 
of the transformations (motions) and structure "of the sound source, which are 
<y ^veridically conveyed in the acoustic medium (see also Schubert, 1974; Warren' 
"& Verbrugge, in press). Thus, structure and motion Jfcare not only seen but 
heard. In the case of speech, the acoustic medium is better suited than op- 
tics, for conveying the structure and transformations of the vocal .tract, many 
of which are invisible in face-to-face communication as well as when the 
speaker is out of view. Articulatory gestures may be beyond the capabilities 
of vision in another way, since their speed and precision exceed the temporal 
and spatial resolution of the visually perceived manual American Sign Language 
(Studdert-Kennedy & Lane, 1980). . ^ 

The unseen messages carried .by natural speech reflect not only the origip 
y of its acoustic energy (respiratory and laryngeal), but also ..the structural 
identity, biokinematic 6oupling, and specific movements of the speaker's 
supralaryngeal vocal tract. 2 In vocalizaticms and musical sounds (femong oth- 
ers), the acoustic wave does. not radiate freely from its oscillatory origin to 
the perceiver: the medium is also molded by the structure and transformations 
of an intervening resonating tube. The size and shape of a resonant cavity 
determine its natural resonating frequency (or frequencies), at which the air 
contained wit.hin its walls will oscillate when excited by a flow of air intro- 
duced from outside. If the extrinsic air flow is already oscillating (e.g., 
when acoustic energy from the vibrating larynx is introduced to the suprala- 
t ryngeal vocal tract), then those oscillatory frequencies in- the flow that 
match the resbnartt properties of the tube will be amplified in intensity; 
other frequencies that mismatch the resonant properties will be attenuated 
(filtered out). The larger a resonating cavity is, the lower its primary res- 
onant frequency, which is also affected by the size and number and positions 
of openings in the cavity. A r> its shape deviates from perfectly spherical or 
cylindrical,'' particularly if there are corners or "side pockets," higher order 
resonant frequencies may be added. Surface properties, such* as the smoothness 
and ela3ticity of the resonant tube's walls, largely determine the time course 
of intensity changes in the resonated frequencies.* • * 

f 

In the case of speech, the critical sound-shaping properties of the reso- 
nant tube include the shape and elasticity of the cheeks, throat , 1 ips, and 
tongue, and the portion and rigidity of the teeth. The properties that -^llow 
the vocal tube to transform in shape are especially important for its articu- 
latory gestures: the hinged movements of the jaw, and the moving and deform- 
ing obstructions of the tongue, lips, and velum (which opens the nasal passage 
to # resonate for sounds like /m/ and /n/). These structural, and transforma- 
tional properties all shape the acoustic speech wave. The time-varying 
formants and other acoustic discontinuities (see Section 1.3) reflect, in 
particular, rapid transformations of the vocal-tract, resonant configuration 
that are produced by movements of the tongue, lips, j*tt%^duelum, within the 
constraints posed by the enduring anatomical relations within the tract. (For 
more detailed discussions of the" physical acoustics of speech, see Fant, 1973; 
Flanagan, 1973**) 

Since these di;>til source properties determine tne acoustic shape of 
speech, they should be available to perceivers (see also Studdert-Kennedy , 
1 981 a, 198ld; Summerfield, 1978; Verbrugge et al., in press). Infants and 
even animals should be able to detect! at least some of the structural and 
transformational invariants in speech, although evolutionary and cytogenetic 
history will affect how well different perceivers are attuned to pick ap the 
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various messages. As discussed earlier, although animals and human infants do 
show phoneme boundary effects, their performance* deviates significantly from 
human adults 1 categorical perception; t 

The speech source perception view may b£ further clarified by comparison 
and contrast with the psychoacoustic and the phonetic views. If auditory 
perception is ^the detection of information about source structure and 
transformations in the acoustic medium, then 'the perception of speech should 
be abstractly similar to t^W perception of other sounds, in agreement with the 
psychoacoustic view. However, the speech source perspective disagrees with 
the psychoacoustic notions that the per^feiver's focus is on event- or 
Speech^neutral acoustic parameters, and jAat these" parameters need to be 
transformed, by the system into percepts. The speech source proposal is also 
consistent in an important respect with the phonetic view. Thai is, speech 
perception even by the infant is considered ^special" and uniquely human; 
however, that special perceptual 'quality is not agreed to be based on phonemes 
for infants. .Moreover, the relative emphasis on "speech" and "perception" is 
different. ^Because the speech medium conveys its source to the perceiver, 
acoustic cues need not be transformed into percepts, whether by codes for 
phoheme categories or by neuromotor codes for phoneme production. * * 

According^© the speech source view, the specialness of speech perception 
derives from the unique structural and transformational properties of its 
sound source, the human vocal tract. It is unique in its complex anatomy, its 
biokinematic organization, and its particular dynamic gestures (e.g., Lieber^ 
^man, 1967; Liberman et al., 1971), all of which are reflected in its acoustic 
productions. Moreover, humans have a privileged relation to speech as the 
tool of human-specific language communication. 

Some of the advantages of this view for several important aspects of 
speech perception in adults and infants will be considered next. 

J 

4.2 Speech Source Perception in Adults 

The major acoustic puzzles in speech perception research have been the 
problems of acbustic variability and normalization (Bee Section 1.3)« As a 
reminder, the acoustic variability problem refers to the sometimes quite 
striking variability in the acoustic properties of an invariantly perceived 
phoneme, which occurs primarily when it is produced in different contexts cf 

^surrounding phonemes. The variability is caused by coart iculation among pho- 
nemes a3 well a3 by the articulatory trajectories that interconnect adjacent 
phonemes. The acoustic properties of the phoneme are thus assimilated to the 
acoustic properties of its neighbors (e.g., in Figure 1, the differences among 
the /l/'s in "little" and "girl 11 ). Another source of acoustic variability is 
the wide variety of acoustic features that can identify a given phoneme (e.g., 
Lisker, 1978). These sorts of acoustic variations pose a greater empirical 
puzzle for psychoacoustic accounts of consonant perception rather than vowel 
perception. They have much stronger effects on the formant trajectories and 
other acoustic features assoc iated with consonants than on the formant fre- 

•quencies in the nuclear portions of vowels (although the latter are also 
affected to considerable degree in conversational 3peech). 

The normalization problem refers to another acoustic context effect, 
causod by variations in the dimensions of different vocal tracts. It causes 
greater difficulties for psychoacoustic explanations of vowel perception than 
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consonant perception. The formant frequencies in vowel nuclei are more obvi- 
ously affected by vocal- tract proportions than are the formant trajectory pat- 
terns associated with consonants. The different effects of these two types Of 
acoustic variability in vowels and consonants suggest a difference in the 
information those two phoneme classes convey, which has been supported by sev~ 
<V eral other lines of research on speech production (e.g., Fowler, 1 980 ; Fowler 
et al. , 1980) and perception (e.g. , Ades,' 1977; Crowder, -1973; Cutting, 
1974; Darwin, 1971; Pisoni, 1973; Studdert-Kennedy & Shankweiler, 1970). 

That adults perceive an invariant identity underlying the acoustic varia- 
tions of a given consonant (e.g., acoustic differences for /d/ in "dee" versus 
"dah" versus "doo") is difficult for the psychoacoustic view to explain, be- 
cause it requires finding a unitary acoustic principle for the invariant per- 
cept. Although several auditory solutions have been proposed for the acoustic 
invariance problem, for example, perceptual "templates" for consonant-specific 
spectral (frequency) acoustic properties (Blumstein, 1980; Searle, Jacobson, 
& Payment, 1979; Stevens., & Blumstein, 1978), these do not hold up well under 
empirical test (BlumsteinT^vTsaacs, & Mertus, 1982; Walley et al. , 1981) or 
logical scrutiny, (e.g., L*berman, 1982; Studdert-Kennedy , 1981 a, 1 981 d) . 
However, the acoustic variability is actually an advantage from the speech 
source view that speech acoustics convey information about the structural and 
transformational invariants of their vocal-tract source. That is, for the 
variability that derives from thft coarticulation of adjacent phonemes, the 
form of that -vocal-tract transformation should clarify rather than confuse the 
source properties that identify both elements. As for the variety of acoustic 
features that can specify a given consonant, these also result from, and thus 
may offer equivalent information about, the vocal-tract invariants identifying 
that consonant. 

4.2.1 Consonant perception . Research on phoneme context effects has 
found 'shifts in category boundary positions that are predictable from the 
coart iculatory ejects of different neighboring phonemes. For example, a con- 
tinuum between "s" and "sh" can be generated by varying only the center fre- 
\# quency of the fricative noise. But the frequency of natural fricatives is 

lowey if the following vowel is "oo" rather than "ah." That is because the 
lip-rounding for "oo," which lengthens the vocal tract and therefore lowers 
its resonant frequencies, is coart iculated with the fricative (k.g., 
tteli-Berti 4 Harris, 1979). In support of the notion that perceivers detect 
ooart iculatory information, the "soo-shoo" boundary occurs at a lower frica- 
tion frequency than the "sah-shah" boundary (Mann K Repp, 1980). Similar 
?o.-jtrt iculatory context effects have been found for stop consonant place of ar- 
ticulation differences (Mann, 1980) and 4 for stop consonant voicing boundaries 
(o.k. , Summerf ield, 1982). 

f<»\'>virch on the perceptual unity of mult i pie acoustic proper t ie3 Sov a 
■ -iM/i-jnaril distinction offers converging support for the speech source posi- 
tion, Tho various acoustic properties are not perceived according to their 
a;.ou.>tic differences but are perceived instead as equivalent information about 
lh*' articulation of the same consonant (Best, Morrongiello, & Robson, 1981; 
Fitrh, Halwos, Erickson, h Liberman, 1980). Those perceptual equivalences and 
v:'>nt«»xt effects ^re difficult to explain by speech-neutral psychoacoustic 
r>*rh in Isrr.s (spo fJtuddort-Kennedy , 1981 d), and control studies hav*> in fact 
fail ^ri to find analogous effects in nonspeech perception CBest et al., 1981; 
Minn, MdOdon, Russell, h Liberman, 1981; Summerf ield, 1982). 
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4.2.2 Vowel perception . If vowel perception is accomplished by detect^ 
ing the underlying vocal-tract conf iguratioa and transformations that remain 
invariant across the structural variations of different vocal tracts, then vo- 
cal-tract normalization would not ,be a problem for speech source perception. 
In contrast, the psychoacoustic account posits that a singular underlying neu- 
tral acoustic description of the vowel' must be derivable by some formula. No 
such description has yet been found because proportional differences among vo- 
' cal tracts, especially male versus female versus child, prevent a uniform 
scaling of vowel formant frequencies among speakers (Broad, 198i); that is,, 
fomiant frequency ratio3 are speaker-specific. Nor can the problem be solved 
through some formula that partials out sex and age differences based on the 
value of some independent acoustic feature such as fundamental frequency of 
the voice, which is the most obvious formant-independent feature that could 
differentiate those speaker characteristics. Trfe sex and age groups show 
considerable overlap in fundamental frequency, a laryngeal property that is 
imperfectly correlated with the variation in supraiaryngeal configurations 
that affect formant properties., Furthermore-, r as the latter observation r — 
gests, the perj^ived sex and age of a speaker depend on supralaryn*. 1 
characteristics rather than on fundamental frequency (Lehiste & Meltzer, 
1973). ThUM, the psychoacoustic approach is left in an unliable position: 
the acoustic normalizatiqn solution would appear to depend on a priori knowl- 
edge of the supraiaryngeal vocal-tract properties whose influence it is trying 
to dircumvent. 

Of further relevance to the speech source view, identification perform- 
ance is better for vowels spoken in CVC context than for isolated vowels, even 
though formant frequencies are more clearly differentiated among the isolated 
vowels (Strange, Verbrugge, Shankweiler, & Edman, 1976). Likewise, similarity 
judgments among vowels in CVCs are clearly differentiated along three dimen- 
sions, which correspond closely to vowel articulatory factors, whereas most 
perceivers differentiate isolated vowels along only one or two dimensions (Ra- 
kerd & Verbrugge, 1982). These contextual effects suggest that coart iculatory 
information aids vowel as well as consonant perception, consistent with the 
ecological premise that structural invariants ("vocal tract configurations) are 
clarified by transformational (dynamic articulation) properties (see 
Verbrugge, Shankweiler, & Fowler, 1980). Studies with CVC syllables whose 
vowel nucleus has been replaced with silence, leaving only the syllable-ini- 
tial and syllable-final formant transitions ( in correct temporal relation) , 
further support the ecological interpretation. Vowel identification under 
these conditions is remarkably well-preserved (Strange, Jenkins, & Edman, 
1977), even if each remaining piece of coart iculatory information is taken 
from a d if f erent-sexed speaker (Verbrugge & Ra f <erd, 1980). These coarticula- 
tory influences on vowel perception are no problem for the view that we detect 
vocal-tract source information, but would be difficult to explain via 
psychoacoustic mechanisms (Fowler & Shankweiler, 1978; Shankweiler, Strange, 
& Verbrugge, 1977; but see Howell, 1981). 

These consonant -and vowel findings suit the speech source interpretation 
of adult speech perception. Certainly, the experimental tasks often required 
subjects to "recover phonemes from the speech stream" (Liberman, 1982). Since 
speech conveys language, the detection of "pure" (nonlinguistic) structural 
and transformational invariants of the vocal tract may rarely if ever be ends 
in themselves for adults, and instead nerve as means to the l inguistic ends of 
recognizing, for example, phonemes. This would not be the case for infants, 
however, since phonemea can be "recovered" from the speech stream only by 
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listeners who already know they are there (Studdert-Kennedy, 1981d). Phonemes 
may indeed be at the "surface of language" (Liberman, 1982) for adults, but 
vocal-tract source information must be at the "surface of speech" fo^ 
prelinguistic infants. We now turn to some recent research with prelinguistic 
infants, which offers strong support for speech source perception. 

4.3 Speech^Source Percept ionTby Infants 

JJ.3.1 Infants y perceptua^^gonatancy for speech * .Perceptual constancy is 
at the base of the acoustic variability and vocal^tract normalization puzzles. 
The findings Just discussed suggest that the adult f s solution lies in a 
perceptual focus on the structural and transformational invariants of a speak- 
er^ vocal tract. Even without the adult's linguistic .motivations, infants 
also show perceptual constancy for vowels and consonants spoken by different 
people or in different phoneme contexts, as described earlier in the chapter 
(Holjnberg et al. f 1977; Kuhl, 1979, 1980, 3981b). Therefore, the invariant 
features they apprehend must exist at the surface of speech and not only at 
the surface of language. The speech source view suggests that the properties 
of relevance to the infant lie in the vocal tract and not in the speech-neu^ 
tral superficial acoustics. 

A psycfhoacoustic interpretation of infant perceptual constancy works no 
better than it did for adults. We cannot assume that infants are guided by 
knowledge about phonemes in their solution of the two acoustic puzzles, so 
some independent source of guidance to the invariant acoustic features of vow- 
els and consonants would be\ needed. One psychoacoust ic solution to the 
normalization problenr might tfc that although adults do not solve it by par- 
tialing out speaker differences based on fundamental frequency, linguistically 
naive infants do. However, this approach does not work, because infants as 
well as adults appear to rely on supralaryngeal information in the formarvt 
structure of speech rather than on fundamental frequency when perceiving 
speaker gender (C. L. Miller, Younger, & Morse, 1982). Nor does an acoustic 
template model (e.g., Blumstein) appear to give an adequate psychoacoustic ex- 
planation to the acoustic invariance problem of infants 1 perceptual constancy, 
for consonants across varying vowel contexts (Studdert-Kennedy, I98ld; Wal'ey 
et al. , 1981 ) . 

What does remain constant in the utterances of a vowel or consonant by 
different speakers or across different phoneme contexts is the underlying 
similarity in v^cal-tract structure and articulator positioning. Speech 
source information would thus seem to offer a more straightforward metric than 
speech-neutral acoustic invariance for infant perceptual constancy, as was ar- 
gued in the ca3e of adult speech perception. Moreover, perception of speech 
source information would certainly be. a more direct guide than speech-neutral 
acoustic patterns for the infant's attempts at vocal imitation of older speak- 
ers and eventual production of words provided by her native language environ- 
ment, y * 

Thus far, the central argument that the infant perceives speech source 
information has ^een oriented around the acou3tic medium. However, as 
in'Mca^'d in th* 1 next section on infants' recognition of auditory and visual 
commonalities in speech, this information i3 provided by sight as well as by 
.--. ^•i. The intermcdal perception of speech by infants provides strong support 
for th»* speech source perception view, and is part icuJar ly difficult to 
nvoneile w 1 1. h the psychoacoustic arid phonetic perspectives* 
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4.3.2 Infarcts' intermodal perception of speech . When adults listen and 
Wc*tch someone speak, the acoustic and optic information about speech is not 
perceptually independent. Rather ? the two seemingly disparate types of infor- 
mation are perceptually unified, implying that a common metric underlies them, 
Prelinguistic infants likewise recognize underlying commonalities between au- 
dio and visual presentations of speech that cannot be described in linguistic 
terms for them. The speech source perception view suggests that infants per- 
ceive intermodally by attending to the underlying articulatory events that 
provide the auditory and visual information. 

To appreciate the contribution of visual information to speech percep- 
tion, recall listening to someone speaking at the front of a room. It prob- 
ably seemed "eas ier to understand what was being said if you could also keep 
the speaker's face in view. This intuition has recently b^en empirically 
validated with adult listeners. Under difficult listening conditions, adults 
perceive speech more correctly when they can watch the speaker than when they 
must rely on their ears alone (Binnie, Montgomery, & Jackson, 1974; Dodd, 
1 977 ; Summer field, 1 979 ) . These findings suggest that listeners obtain 
information about speech not only from the acoustic signal, but also from the 
optical information that results from articulatory maneuvers, 
» 

Of course, the major responsibility for speech perception is carried by 
the auditory modality. That blind adults successfully perceive speech, where- 
as the deaf have serious difficulty with lipreading, would seem to imply that 
auditory information is both necessary and sufficient for speech perception, 
although visual information plays a negligible role unless listening is 
particulary difficult. Speech researchers accepted this logic until recently, 
when MacDonald and McGurk (1978) and Summerfield (1979) reported that 
listeners fail to recognize phonemic conflicts in concurrent auditory and 
visual presentation3 of 3peech, instead perceiving a unified speech event. 
The percepts did not veridically reflect either the acoustic or the optic sig- 
nal considered in isolation. For example, when perceivers watched a face 
silently articulating "ga" while a voice said "ba," they heard "da." These 
results indicate that in face-to-face speech perception, listening is not sim- 
ply supplemented by arbitrary, learned associations between vocal-tract 
configurations and speech sounds. Rather, at the level of the speech event 
itself, the information provided by the two modalities shows an intermodal 
articuiatory equivalence. 

Two opposing views have been offeree °or adults 1 perception of acous- 
tic-optic equivalence in speech. MacDonald -ind McGur\< (1978) have suggested 
that the equivalence be described linguistically, in terms of abstract fea- 
tures of phonemes. The other view, proposed by Summerfield (1979) and based 
on the Fowler et (1980) ecological interpretation of speech production 

findings, argues that the equivalence is nonlinguist ic and modality-free, 
arising from the dynamics of articulation. The first view has been criti- 
oi/.od, in part, because it accounts for only a limited number of the speech 
percepts that result from audiovisual conflict (see Summerfield, 1979, for de- 
tailed discussion of these and other criticisms)* 

~i ri litfht of recent findings, the? latter view appears to bout account for 
w ;>r<- 1 i nwMJ i:st ic infants perceive speech intermodally. Infants' sensitivity 
: \ ■ ■ i." s » j ••■o-ptir' t\]uiva lenc^:} in sfK-'V-h ha.°, boon demonstrated und^r two 
• « : r. i u,.,, Unt'.T th»' first condition, infants were presented with icoub- 
• i«- >i displays in which rh»* ov<~all synchro! ' and th* specific 

i*» I I) J 
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articulatory details presented in the two modalities of information were 
confounded. In one study, to 4-month-old infants saw the mirrored reflec- 
tion of a woman's face repeating nursery rhymes; the auditory signal was ei- 
ther in synchrony or delayed relative to the optic presentation by 400 
milliseconds. The infants watched the reflection significantly longer when 
the visual and the auditory presentations were in natural synchrony (Dodd, 
1979). In a second study, -month-old infants viewed two women speaking, in 
two adjacent video films, wh^ie the concurrent speech of one woman was played 
over a central loudspeaker . Infants preferred to look at the face that talked 
in synchrony to/the audio speech presentation (Spelke & Cortelyou, 198J). 
/ 

It is unclear whether the infants were responding to the general synchro- 
ny and/or |he spepff ic articulatory details of the optic and acoustic dis- 
plays, however, since these two aspects of audiovisual match were confounded 
in both experiments. They might have only recognized the overall synchrony, 
for example, for syllable onsets, between the speech seen and heard. However, 
they might also have preferred watching the natural acoustic-optic concurrence 
of specific articulatory gestures. For example, infants might prefer to look 
at a speaker's lips being rounded and protruded for the production of the 
vowel "oo" as they heard an audio "do," as opposed to looking at the speaker's 
lips being opened wider to produce "ah." 

This prediction was recently tested experimentally (Kuhl & Meltzoff, 
1982; MacKain, Studdert-Kennedy , Spieker, & Stern, 1981). Kuhl and Meltzoff 
(1982) presented 4- to 5-month-olds with two adjacent films of a woman's face 
synchronously articulating the vowels "ah" and "ee," while one of those vowels 
was presented auditorily and in synchrony over a central loudspeaker. The 
infants preferred to watch the film whose articulatory details specified the 
vowel presented auditorily. In the MacKain et al. study, disyllables (e.g., 
"mama," "lulu") were presented audiovisually , under similar experimental 
conditions, to 5- to 6-month-old infants. The infants looked significantly 
longer at the video display whose articulatory dynamics matched the acoustic 
presentations, for the disyllables "mama," "baby," and "zuzu." These findings 
indicate that young infants recognize at least some auditory-visual equiva- 
lences of articulatory gestures, and are not only sensitive to general syn- 
chrony. Moreover, this intermodal recognition was accomplished in the pre- 
sumed absence of a language system, making linguistically based explanations 
(e.g., MacDonald & McGurk) untenable-for this age group. The commonality that 
the Infants recognized between the acoustic and optic information may best be 
described in nonlinguist ic terms (Summerf ield, 1978). 

Given that the infant seems attuned to detect vocal-tract source informa- 
tion in speech, what may be the organization of the supporting perceptual sys- 
tem? A consideration of the biological basis of this attunement should move 
U3 closer to understanding the ease with which humans recognize auditory-visu- 
al equivalences in articulatory details, and the infant's apparent ease in 
learning to speak a first language. Research with adults suggests that the 
answer lies in the functional asymmetries of the left- and right-cerebral 
hemispheres of the human brain. 

. li Left -Hemisphere Attunement for Articulatory 1 nf ormatl on 

For the adult, the left-cerebral hemisphere shows a specialized advantage 
:' -r t.he perception of speech, in contrast to a right-hemisphere advantage for 
'.ii>- perception of music and certain other nonspeech sounds (e.g., Kimura, 
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1973). Even in young infanta, the hemiapherea are differentially reaponaive 
to human apeech veraua other aounda. Auditory evoked reaponae asymmetries in 
young infanta favor the left hemisphere when words or syllables are preaented 
auditorily, whereaa the right-hemiaphere response is atronger when muaical or 
other nonapeech aounda are presented (Molfese, Freeman, & Palermo, 1975). In 
dichotio lister* «.ng testa, conaistent with the adult findings previously cited, 
infanta aa young aa 2\ to 3 montha show a right-ear advantage (REA) in 
discriminating among consonants, indicating a left-hemisphere auperiority. ' 
Converaely, they ahow a left-ear advantage (LEA) in' diacriminating notea 
played by different musical inatrumenta, indicating a 'right-hemiaphere 
auperiority (Best, Hoffman, & Glanville, 1982; Entua, 1977; Glartville, Beat, 
& Levenaon, 1977). 

These functional aaymmetriea in infanta indicate an early lef t-hemiaphere 
attunement to information in apeech, which could be an important biological 
aupport for the infant' a perceptual discovery of the articulatory patterns of 
spoken words. But the data do not indicate exactly the sort of information in 
speech to which the lef t- hemisphere is attuned (aee Molfeae, Nuflez, Seibert, & 
Ramaniah, 1976). Two recent findinga suggest that the infant's left hemi- 
sphere is apparently attuned to information about the articulatory gestures of 
the vocal tract. 

Aa an additional result of their intermodal apeech perception study,' Mac- 
Kain, Studdert-Kennedy, Spieker, and Stern'(l983) found a rightward attention- 
al 'bias (implying lef t-hemiaphere activation: Kinsbourne, 1973, 1982), which 
facilitated the infanta' recognition of acoust id-optic commonality, in the 
articulatory detaila of speech. In that experiment, infanta attended primari- 
ly to either the right or the left video monitor during the aynchronoua audio 
preaentation. An analyaia of viaual preferences indicated that the infanta 
recognized auditory-viaual matchea versus mismatches in articulatory proper- 
tiea only when they were attending to the right video monitor. Since inter- 
modal perception of apeech appeara to entail the recognition of ita vo- 
cal-tract source properties, these result^ indicate the infant' a recognition 
of that information ia facilitated by a lef t-hemiSphere attentional bias to- 
ward those properties. 

The reaulta of another study (Beat, 1978) may further clarify which as- 
pects of human speech are the object of the infant's lef t-hemiaphere attune- 
ment. Adults ahow a conaiatent left-hemisphere advantage for conaonant 
perception, whereaa iaolated vowels yield a nonaignif leant perceptual asymme- 
try (e.g., Studdert-Kennedy & Shankweiler, 1970; Weiss & House, 1973). This 
vowel-consonant difference in hemispheric perceptual asymmetry may depend on 
the earlier discussed differences in the acoustic and articulatory properties^ 
of vowels and consonants. 

The aim of the infant study (Best, 1 978) was to determine whether 3£ 
month olds 3how a similar hemispheric difference in perception of consonants 
versus vowels. The infants showed a clear left-hemisphere advantage for 
discriminating a set of synthetic consonants, as adults did. However, the 
Infants also,, showed a clear righ t-hemisphere advantage for discriminating 
steady-state synthetic vowels, which differs from adult reports. 

The psychoacoustic interprets- ion of the adult hemisphere differences is 
th.it th»- hemispheres are differentially specialized for processing the aeous- 
t. if features that differ between consonants and vowels (e.g., Cutting, 197*1; 
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Schwartz & Tallal, 1980), However, this interpretation confounds acoustic and 
articulatory differences between vowels and consonants, and must be rejected 
on methodological grounds (Studdert-Kennedy & Shankweiler, 1980), as well. as 
for the general criticisms against the psychoacoustic view. A speech source 
interpretation would instead consider articulatory differences between conso- 
nants and voweis. Indeed, consonant and vowel productions engage different 
coordinations of the articulatory musculature (Fowler, 1 980) • Moreover, 
vowel-consonant differences in intermodal speech perception effects (Summer- 
field, 1979; Summerf ield, McGrath, & Forster, 1982) suggest that such articu- 
latory differences may be influential in perception. The pattern of the 
phoneme class differences in production and in intermodal perception suggests 
that consonant information is conveyed in rapid articulatory changes, whereas 
vowel information is conveyed in relatively more slowly chahging configura- 
tions of the tongue, lips, and Jaw, 

The implication of these two recent findings on infant hemispheric 
asymmetries is that the left-hemisphere attunement takes the form of an atten- 
tional bias toward information about rapid articulatory transformations. In 
qomplemfcnt to that attentional bias, the infant f s right hemisphere may be bet- 
ter attuned to information about relatively more enduring structural proper- 
ties of sound-making objects, such as the structural properties of instruments 
that determine their, musical timbre and the t configurations of the articulators 
in the human vocal tract that determine steady-st&te vowel color. 

s 

\^ In conclusion, these specializations of the cerebral hemispheres for 
respondir^g to different aspects of articulatory information in speech may 
offer bioitfg^cal support for the infant's discovery and production of words. 
In the final section, the relation of speech source perception to the discov- 
ery of words and the broader motivation provided by the context of 
communicative development will be briefly discussed. 

^ 5. The Broader Context of Communicative Development 

The speech medium carries a number of parallel messages (Pike, 1959), and 
not only the sort of vocal-tract source information we have been focusing on 
at the surface of speech. To learn language, the infant must discover, or 
learn to recognize, many or all of those other messages as they are specified 
b v y convergent information in speech and in Hhe context of its occurrence. 
Some of the messages that must be discovered are linguistic, beginning with 
words ' or phrases. Although the linguistic messages are more abstract than 
speech source messages, their expression in the speech medium depends directly 
on speech source information. Words and phrases are conveyed in the medium as 
patterns of articulator configurations and transformations, whicn have invari- 
ant properties across speakers, speaking rates, and surrounding speech con- 
text. Therefore, the infant's eventual discovery of them depends on attention 
to Informational invariants in vocal-tract shapes and gestures as they are 
patterned over time, and conveyed in both the acoustic and opMc rredia. 

As for phonemes,, their discovery as invariant vocal patterns tnat convey 
difference in meaning appears- to be served by the child's developing use of 
words rather than vice ver3a (Menn 1980; Menyuk & Menn, 1979). Given that 
the function of phonemes in a language ' system is determined by word meanings 
'and contrast, it is consistent with this interpretation that maternal speech 
to toddlers who produce words does include hyperoif f erent iat ion in the produc- 
tions of some phoneme contrasts, whereas maternal speech to prel inguist ic 
infants does not include such hyperd if ferent iat ion (Mal3heen, 1980). 
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But the recognition of invariant vocal patterns, of course, is still not 
support enough for the discovery of words and phonemes. In order to compre- 
hend and use language normally, the infant <must also come to recognize the 
4 speaker's communicative messages. Some>of these appear at the surface of * 
speech, where the infants can detect them. Emotional affect may ce conveyed 
to the infant directly in the mother's speech, for example, through the level 
and rodulations of her voice* pitch and intensity (see Stern, Spleker, & Mac- 
Kain, 1982). These communicative messages often gain converging support from 
information in the visual and even in the haptic modalities; for example, 
emotional affect' often receives contextual support in the speaker's changing 
facial expressions and the way she or he touches or holds the infant. The 
argument here is that the infant's discovery of more abstract, rommunicative 
messages, notably the referential meaning of words, requires such nonspeech..^ 
contextual support. Word meanings cannot be revealed for the first time 
through the speech medium alone. 

J 

This observation brings us to the end of our discussi^jr, moving as it 
does beyond both the prelinguistic period and the infant's perception of the 
surface of speech. In closing, however, it is suggested that the prelinguis- 
tic infant's ability to perceive in speech the structure and transformations 
of its vocal-tract source provide her or him a crucial tqol for discovering 
the more abstract messages of language. 
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Footnotes 

m 

l The general underlying issue of meaning is complex and certainly cannot 
be settled here. The source 0/ meaning for our* knowledge of the world is at 
the heart? of a centuries-long debate in epistemology between phenomenologists 
and realists. Psychology has taken up this debate. No satisfactory solution, 
acceptable to all, has been r?ache % d in eitlier field. In the context cf this 

Jhapter, 'che discussion reflects the author's view on the relation of the top- 

/ Ic to how infants perceive speech. ^ 

2 Natural speech also identifies the speaker as. a member of ,the human 
3pecies. Synthetic speech, insofar as it "works" perceptually, must capture 
necessary information at)Out human vocal-tract dynamics, and usually also about 
the structure of a generic vocal tract, often appropriate for an adult male of 
Indeterminate age. Rarely, however, does synthetic speech capture sufficient 
"textural" detail about a natural vocal tract to sound like a live human 
speaker, even an unknown one. 
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VOWEL-TO-VOWEL COARTICULATION IN CATALAN VCV SEQUENCES* 

« 

Daniel Re^aaens 



Abstract. Eleotropalatograph'ic and acoustical data on V-to-V 
coarticulatory effects were obtained for Catalan VCV sequences, with 
the consonants representing different degrees of tongue-dorsum con- 
tact (dorsopalatal approximant [J], alveolo-palatal nasal [ji], alve- 
olo-palatdl lateral [jC] and alveolar nasal [n]). Results show that 
the degree of V-to-V coarticulation in linguopalatal /fronting and F2 
frequency varies monotonically and inversely with 'the degree of 
tongue-dorsum contact, for larger carryover effects than anticipato- 
ry effects. The temporal extent of coarticulation also varies with 
the degree of tongue-dorsum contact, much more so for anticipatory 
effects than for carryover effects. Overall, results indicate that 
V-to-V coarticulation in VCV sequences is dependent on the mechani- 
• • cal constraints imposed on the tongue dorsum to achieve dorsopalatal 
closure during the production of the intervening consonant. Moreo- 
ver, anticipatory effects but not carryover effects involve * articu- 
latory preprogramming. 

. h. Introduction 

Studies on coarticulation address the question of how the phonemic string 
is produced in running Speech,, The failure to discover a one-to-one mapping 
between phonemes and articulatory targets suggests that the production units 
involve patterns of^ spatial and temporal coordination among several articula-* 
tors (see, for example, Beia-Berti & Harrie, 1981). Fowler (19§0) and Fowler, 
Rubin, Remez, and yurvSy (1980) have proposed that coarticulation results 
naturally from such coordinated patterns. of articulatory activity. According 
to these researchers, the process of speech production is executed by means of 
coordinative structures, namely, muscle groupings organized functionally to 
actualize linguistic units in fluent speech. The constraints on articulatory 
movement imposed by the coordinative structure define those articulatory di- 
mensions along which adjustment to context may take place. Thus, in light of 
this .pproach, coart ' ou latory effects 'ought to be predictable from constraints 
on articulatory iisplacement. 0n v these grounds, evidence is presented in this 
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study for systematic variability in transconsonantal vowel-to-vowel .coart. 
latory effects as a function of the degree of tongue-dorsum contact f<jff the 
intervening consonant. 

Oilman (1966) has proposed a model to account for coart ioulation for/bila- 
bial, alveolar, and velar stops in VCV sequences.,/ In this model, VCV 
coart icylatory effects are interpreted as reflecting "an underlying V-to-V 
tongue movement with a superimposed consonantal constriction, which is actual- 
ized by commands directed towards different regions, of the tongue. Ohman 
distinguishes at least three separate tongue regions) that can be independently 
controlled: regions that shape the whole tongue . body (as used for 'the produc- 
tion of vowels), the apical regicn (as used for the production of alveolars) , 
and the dorsal region (as used for the production of velars). Tongue regions 
left uncontrolled by these consonantal commands can "conform to the underlying 
diphthongal gesture, thus allowing for V-to-V coart ioulation. 

Ohman' s interpretation has the interesting implication that degree of 
coart iculat ion > should vary with the constraints exerted upon the kinematics of 
the different tongue dimensions under control. Thus, -for instance, it could 
be that the production of place categories other than bilabiaj , alveolar, and 
velar imposes restrictions upon tongue activity so severe as to almost prevent 
V>-to-V coarticulation from occurring. In fact, there is evidence from the 
literature that palatal articulations block V-to-V coarticul?Mon to a large 
•extent. Thus, it has been found for Russian palatalized consonants (produced 
with a primary constriction plus some raising of the tongue dorsum towards the 
palate) that formant transitions are barely influenced by the quality, of the 
transconsonantal vowel (Ohman, 1966; Purcell, 1979). Also, data on V-to-C 
coarticulation show that English [J] (Lehiste, 1964; Stevens & House, 1964) 
and Italian [iC] (Bladon & Carbonaro, 1979) are highly resistant to effects 
front the surrounding vocalic environment. 



The prediction tested in the present study was that the degree of V-to-V 
coarticulation in VCV sequences varies monotonically and inversely with the 
degree of tongue-dorsum contact required for the production of the consonant. 
Thus, for consonants produced with varying degrees of constraint on 
tongue-dorsum displacement towards the palate, more tongue-dorsum 1 contact 
ought to allow less transconsonantal coarticulation, and less tongue-dorsum 
contact, larger transconsonantal" coart iculatorv effects. Moreover, degrees of 
tongue-dorsum contact and degrees of transcon3cnantal coarticulation ought to 
vary Ira similar-amounts. 

The dorsopalatal approximant [J], alveolo-palatal .nasal Ln], alveolo-pa- 
latal lateral [a] and alveolar nasai [n] in Catalan (a Romance lapguage spoken 
in Catalonia, Spain) were chosen for analysis. The degree of tongue-dorsi-m 
contact associated with these consonants varies in the order [ j]>[ji]>[^]>r"»]» 
as traditionally described and according to a suryey of palatographs record- 
ings from the literature across different Romance languages and contextual 
conditions (e.g., Haden, 1938; Rousselot, 1 92M-1 925 ) . Thus, *n a language 
with this set of consonants, [jj, CnJ, La] and [n] ought to show increasing 
Agrees of V-to-V coart iculat ion. This hypothesis is based on the assumption 
U. a artieulatory control during the production of [j], UO and [a] is 
-T^T::: ,:i i> ».»x»rt-J upon tongue-dcrsum raising towards the. hard palate. On 
■ :. ,-r- . .:i.:s, r.;v dorso-pa lata 1 [j] ought to show maximum degree of 

• ; r\t ir.t ..;;<! minimum d^greo of V-to-V coarticulation since all 

• ..'ivity is directed towards tnis gesture; loss tongue-dorsum con- 
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rJZ L ° I V "r t0 ;! coart ^ cula tion occurs for alveolo-palatals (more so 

ror than for [ji]) since muscular activity is directed simultaneously to- 

wards tongue?blade contact and tongue-dorsum contact. 

Another purpose of this investigation; was to analyze: 1) the relative 
salience of V1-to-V2 (carryover) vs. V2-to-Vl (anticipatory) effects: 2) the 
temporal extent of coartlculatory effects. , 
»-» 

Data on coarticulation in asymmetrical VCV sequences (mainly English) 
with cbnsonants involving lingual closure show large anticipatory and carryo- 
ver effects during closure and along the VC and CV transitions (see, for re- 
view, Parush, Ostry, & Munhall, 1983). However, small and' asystematic antici- 
patory (English: Kent & Moll, 1972; German:. Butcher & Weiher, 1976*, and 
carryover ^(English: Gay, 197* ) , V-to-V effects have been reported at the 
steady-state vowel period. Several studies show that carryover/ effects are 
larger than anticipatory . effects for English (Bell-Berti & Harris, 1976: Gay 
19m. In this study, the.relative salience of transconsonarrtal anticipatory 
vs. carryover effects at the formant transitions and at 'the steady-state vowel 
are investigated, for Catalan. 

If the antioipat|/y process reflects articulatory preprogramming and the 
carryover process is primarily due to mechanical inertia constraints, antici- 
patory effects should be more sensitive than carryover effects to the temporal 
aspects of coarticulation. Recent evidence shows that "this is the case for 
English (Parush et al., 1983). In the present study, this issue ^is 'also 
investigated, for Catalan, as well as the extent to which V-to-V temporal ef- 
fects are dependent on or independent of the degree of dorsal contact required 
•for the production of the consonant. 

I. Method 

A. Articulatory Analysis 

Electropalatographic (EPG) data were co^ected for the Catalan consonants 
UJ, LnJ, UJ and [n>in all possible VCV combinations for V-[i], fa], [u]. 
Ml combinations can occur in running speech in Catalan. The utterances' were 

embedded in a Catalan frame sentence "Sap poc," meaning "He knows just 

a little." A single speaker of Catalan (speaker Re, the author), also~"fluent' 
in Spanish, English, and French, repeated an utterances 10 times with the 
artificial palate in place while the electropalatographic signal and the cor- 
responding acoustic signal were recorded on tape for later analysis. 

A mouth cast for speaker Re was used to bull. .xU artificial palate. The 
artificial palate is a device, 2-mm thick, mad? of acrylic resin, equipped 
with 03 small gold electrodes evenly distributed over its surface (shown in 
Figure 1). Patterns of 1 inguopalatal contact were tracked over time (1 frame- 
I 1 ). 6 ms) on a' v display panel with an array of 63 lamps in an analogous 
configuration; as uhe tongue touches an electrode, the corresponding lamp 
lignts up. Flectrodes were reproduced on the panel in a two-dimensional dis- 
play^ (as in Figure 1), which does not account for the vaulting of the sub- 
ject's palate. Detailed information about this palatographs system (Rion 
Fu-etropaiatograph Model DP-01) is available in Shibata (1958) and Shibata et 

«'i i ♦ (1 978 ) . 
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Figure 1. Electropalate. 



The electrodes are arranged in five semicircular rows. For purposes of 
ca interpretation, they are grouped in arciculatory regions and sides, tak- 
.ng advantage of their equidistant arrangement in parallel curved rows on the 
artificial palate. As shown in Figure 1, the surface cf the :pa<late{was divid- 
ed into four articulatory regions (alveolar, prepalatal, mediopalatal, and 
postpalatal) and into two symmetrical sides (right and left) by a rffedian line 
traced along the central range of electrodes. This division into orticulatory 
areas on the palatal surface is based on anatomical consideration (Catford, 
1977)...< 

For" each VCV utterance, data were tabulated from onset to offset of pala- 
tal contact, for a ^variable number of on-electrodes on each side of the 
palate. To tabulate the placement of on-eiectrodes frame by frame, every 
electrode, was gJTven a code ^number on each semicircular row for each side of 
the palate start-lng from the backmost electrode (1 ) up to the frontmost 
electrode (8.5 for row 1, 7.5 for row 2, and so on). Electrodes placed on the 
median line were assigned to both sides; thus, row 1 had 8.5 elect-odes, row 
2 had 7.5, and so on (see Figure 1). Given the fact that contacts were always 
made f/irst at trte rear of the palate (except for the sequence [uAu], as dis- 
cussed in the Results section) and that back electrodes stayed on during the 
entire production, the number of on-electrodes on each row was equivalent to 
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the code number of the frontmost on-eleotrode. Therefore, a recording of each 
code number for each row in each frame simultaneously indicated the amount of 
linguopaiatal contact and the degree Of linguopaiatal fronting for that row at 
that moment in time. For data interpretatfonT^ieans were obtained by averag- 
ing the number of on-eleotrodes on each row frame by frame across repetitions 
of the same sequence lined up according to the po^nt of maximum contact (PMC). 
PMC for a token' was considered to be at th,e frame that presented the highes/; 
number of on-electrodes. * • " / 

/ 

B. Acoustical Analysis 

Four repetitions of all VCV combinations from this and two other Catalan 
speakers (Bo and Ca), also fluent in Spanish, were recorded for acoustical 
analysis. They were, digitized at a sampling rate of 10 kHz, after preemphasis 
and low-pass filtering. An LPC (linear predictive coding) program included in 
the ILS (Interactive Laboratory System) package available at Haskins Laborato- 
ries was used for spectral analysis. Dynamic trajectories foi< the three 'low- 
est spectral peaks from onset to offset of voicing, as detected on the waveform 
displays wore reproduced on tracing paper and averaged across repetitions of 
the same sequence lined up according to PMC. To identify PMC on the acoustic 
wave for speaker Re, EPG data were also digitizeu at a sampling rate of 20 
kHz, with no previous preemphasis or filtering. Labeling proctllures were 
executed by means of WENDY (Haskins Laboratories Wave Editing and Display sys- 
tem). Forspeakers Bo and Ca, for whom no EPG data were available, PMC was 
estimatejTBy visually identifying the FT frequency minimum in the transition 
from thq first vowel to the consonant. This procedure was chosen on the 
grounds that, of all the spectral characteristics present in the apoustical 
display of the utterances under study, such a point was . und empirically^ to 
m§tch PMC for speaker Re. • 

For each consonant, . articulatory and acoustical data are presented as a 
function of time, considering first the general production characteristics in 
symmetrical VCV environments, and subsequently V-to-V coarticulatory effects 
in asymmetrical VCV environments. In the articulatory domain, patterns of 
contact in the palatal region (mediopalate and postpalate) that reflect 
tongue-dorsum activity are of particular concern; in the acoustic domain, F2 
^frequencies that, for palatal and alveolar consonants, reflect changes in the 
size of the ba.'< cavity behind the primary constriction and in degree of pala- 
tal constriction (Fan^ 1 960) are emphasized. 

II. Results 

A . Ge neral Production Characteristics 

VCV utterances involving [jj, [ji], [iC] , and [n] were found to exhibit 
different patterns of linguopaiatal contact and contrasting F2 patterns. Such- 
patterns were correlated with different degrees of tonpue-dorsum contact re- 
quired for the production of each consonant, with [ j]>L r 0>r>Cj^Ln] . ' 

1 . Art iculatory Data 

Trajectories of linguopaiatal contact in VCV symmetrical environments for 
f J J * Ifl], [iC], and [n] with V»[-ij,-faJ, »j] are di?played in th~ top panels of 
Figures 3, i», and 5. Each v .rajectory ^presents an average over 10 repeti- 
tions. Each panel provides data on linguopaiatal contact (vertical axis) over 
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time (horizontal axis). Data have been displayed for three rows of electrodes 
on the right sidS of the palate* namely, row 1 (contact with the tongue 
sides), row 3 (contact with the region between the tongue sides and the center 
of the tongue dorsum) and row 5 (contact with the center of the tongue dor- 
sum). .Linguopalatal contact has been plotted in terms of the code numbers for 
any on-electrodes on each row starting from the backmost electrode (1) up to 
the frontmost electrode' (8.5 for row 1, 6.5 for row 3, 3.5 for row 5). As ex- 
plained in the Method section, the plot of ea i trajectory over time repre- 
sents the frontmost contacted electrode. For all consonants, contact is 
cumulative from back to front such that the frontmost electrode is also >a good 
representation of total amount of contact. For [u£u] (see Figure 4), the 
tongue did not always make contact with electrode 1, on row 1 of the artificial 
palate presumably, because of the positioning of the tongue sides required to 
allow lateral airflow. Time has been measured in ma frame by frame. The 
line-up point for VCV sequences with Ln], M and [n] is at PMC. V Line-up pro- 
cedures " for [VjV] sequences were handled differently. EPG data for [aj'a] and 
[uju] showed a single frame (PMC) with maximum contact all over the palatal 
surface; however, maximum contact for [iji] was found to last for s.ix frames 
(-95 ms). To account for this contrast, PMC for Qaja] and [uju] was/lined up 
with the midpoint of the period of 'maximum contact for [iji]. 

For all consonants in all sequences (see Figures 2 through 5), onset of 
contact occurs earlier at the tongue sides (row 1) and intermediate tongue re- 
gions (row 3) than at the center of the tongue dorsum (row 5); analogously, 
offset of contact occurs later on rows 1 and 3 than on row 5. Displacement 
along the vertical axis for < ^v row indicates degree of linguopalatal front- 
ing. All sequences show th he degree of linguopalatal fronting increases 
during the VC period from onse. -to PMC and decreases during the CV period from 
PMC tcr> offset. This patten, o*" displacement over time has been reported in 
the literature for dorsal articulations such as [j] and velar consonants (Kent 
o. Moll/ 1972). 

Trajectories for all consonants on row 5 show that tongue-dorsum contact 
decreases for [J], C u n]>[A]>Cn] (see also Introduction). Thus, 'the number of 
on-electrodes at PMC on row 5 adding across different vocalic conditions for 
each consonant varies for [J] (5.3), fr] (5.3), CiC 1 (2.9), [n] (1.3). Moreo- 
ver, vowel [a] shows contact with [J but rot with [ji] nor with [X] and [n]. 
Vowel [u] shows no contact with [n]. Differences in degree of tongue-dorsum 
contact far alveolo-palatal? [ji] and [A] vs. alveolar [n] are related to the 
fact that, while the two categories of place of articulation involve alveolar 
contact, alveolo-palatals but not alveolars are produced with simultaneous 
raising of the tongue dorsum towards the palatal vault resulting in lingual 
contact at the center of the mediopalatal and postpalatal regions. 

Table 1 shows maximum and minimum onset and offset contact ., values with 
respect to PMC at the tongue sides (row 1) and at the center o£ th tongue 
oorsum (row 5), as derived from figures 2 through 5. These values give a good 
estimate' of the duration of the VC period (from onset to PMC) and, the CV peri- 
od (from PMC to offset) of linguopalatal contact. Onset values/ acro.03 rows 
show the pattern C jj>Cj>]>[A], [n]; offset values across rows show the pattern 
[ j j>[,n]>[A]>[n] . Thus, VC, CV, and VCV contact durations decrease as the de- 
cree of tongue-dorsum contact for the consonant decreases. 
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Trajectories of artlculatory dynamics over time (in ms) for [j] in 
contrasting symmetrical environments lined up at PMC .(for Caja] and 
[ii3u]) and at the MC midpoint (for [iji]) (speaker Re). Top: tra- 
jectories for the frontmost contacted electrode on rows 1 (tongue 
sides)! 3 (between the tongue sides and center of the tongue dor- 
sum) and 5 (center of the tongue dorsum)- cf the right side of the 
palate; electrode code numoers and articulatory regions have been 
given for each row. Bottom: trajectories for p1 f F2 and in Hz, 
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Figure 3. Trajectories of artlculatory dynamics over time for [jt] in con- 
trasting symmetrical environments Lined up at PMC (speaker Re). 
Top: EPG data; * bottom: acoustical *iata. .See Figure 2 for de- 
tails about the displays. 
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Figure 4. Trajectories of articulatory dynamics over time for [j?] in con- 
trasting symmetrical environments lined up at PMC (speaker Re). 
Top: EPG data; bottom": acdustical data. See Figure 2 for de- 
tails about tne displays. 
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Figure 5. Trajectories' of articulatory dynamics over ,time for [n] in con-. 

trasting symmetrical- environments lined up at PMC (speaker Re). 
Top: EPG data; bottom: .acoustical data. See Figure 2 for .de- 
tails about the displays. 
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Table 1 

Maximum and minimum values, of onset and offset time (in ms) for linguopalatal 
contact at the tongue sides (row 1) and at the center of the tongue dorsum 
(row '5) for consonants [J], [ji], lA], and [n] in symmetrical VCV environments. 
Data are from one Catalan speaker (Re). 
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Maximum and minimum F2 values (in Hz) at PMC for consonants [j], [ji] , [A], and 
[n] in symmetrical VCV environments. 6 Data are from three Catalan speakers 
(Re, Bo, and Ca). 
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The. trajectories give information about average velocity of linguopalatal 
displacement over time. 'Average' velocity of movement can be obtained by di- 
viding the total displacement for a given movement by the time required to 
execute the movement (Kuehn & Moll, 1976). Accordingly, velocity increases as 
duration decreases with degree of displacement remaining" constant. Trajecto- 
ries for the sequence** [aCa] and [uCu] in Figures 3, 1 and 5 show that, for 
consonants involving iron ted alveolar contact (tj»]» [£],' and [n]) and, thus, 
similar degree of tongue-tip and/or tongue-blade displacement, .time to achieve 
and release alveolar constriction (see row 1) decreases for [,n]>[iC]>[n] . 
Therefore, velocity of displacement for this articulatory gesture increases in 
that progression, as the degree of tongue-dorsum contact for the consonant 
decreases. On the other hand,' velocity, decreases as degree of displacement 
'decreases and duration increases. This is the case for [j] vs. all other con- 
sonants; thus, [j] is articulated with a lesser degree of alveolar fronting 
and involves greater VCV contact duration. Ove ? all, at the tongue sides, the 
velocity of displacement ^appears to be inversely related to the- degree of 
tongue-dorsum contact. 

In summary, several patterns of fronting, duration, ' and velocity of 
linguopalatal contact vary with the ' degree of .tongue-dorsum contact for 
CJ]>CJ>]>C^3>Cn]. 

2. Acoustic Measurements 

* 

Table 2 reports maximum and minimum F2 values for [j], [JO, [iC] , and [n] 
at PMC in symmetrical environments for speakers Re, Bo and Ca. As the table 
shows, F2 values for all speakers decrease for [ j]>[ji]>[/C]>[n] , namely, as the 
degree of tongue-dorsum contact decreases. Moreover, since F2 is dependent on 
the -back cavity behind the place of constriction for [j] and [£] , its frequen- 
cy decreases (for [ j]>[£] ) ,as that cavity becomes larger (for [£]>[j]). F2 
for nasal consonants and [n] is presumably pharynx-cavity dependent, as 
indicated by aa F2 continuation from V1 into the consonant in Figures 3 and 5 
(bottom) and" given the fact that, for nasal consonants, the mouth cavity be- 
hind the constriction acts as a shunting cavity. On these grounds, the phar- 
ynx-cavity size for [ji] and [n] ought to be smaller than the whole mouth-phar- 
ynx system for [j] and U'] and, thus, cause a higher F2; however, acoustical 
data reported in Table 2 show that this is not s the case. Also, if, as 
revealed by the area functions of Russian [ij] and"[n] reported by Fant (1960), 
Catalan O] and [n] are produced with similar pharynx-cavity size, these two 
consonants ought to show similar F2 frequencies; however, acoustical data 
reported in Table 2 show a much higher F2 for [ji] than for [nj. In the ab- 
sence of X-ray data on vccal tract configurations for these two Catalan conso- 
nants, it can only be .stated with confidence that F2 differences for [j], [jr]," 
U] and [nj are inversely correlated with differences in degree of tongue-dor- 
sum contact. 

The same F2 relationship holds during the consonantal steady-state period 
for all speakers. This is exemplified by the displays of formant trajectories 
for speaker Re, lined up at PMC with the EPG data in Figures 2, 3, *», and 5 
(bottom). The steady-state period for [j], 03, [jC], and [n] lasts roughly 
from PMC up to +75 ms. Thus, little change in F2 frequencies appears to be 
t iking place during the consonantal steady-state period and, presumably, in 
degree of tongue-dorsum contact. 
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. In summary,- as for the EPG data, F2 trajectories 3^ow frequency .values 
that vary with the degree of tongue-dorsura contact for DjJ>[,n]>[/:]>[n] . 

* # 

<* B. Co art i cul atlon * < * 

■ A 

. —> . 

It was also found that the extent of V-to-V coarticulatory 1 effects for 

[ j]<[jO<[rt]<[n] is inversely related to the different degrees of tongue-dopsum 

contact. * * 

1 . Art iculatory Data * 

Trajectories Of linguopalatal contact were plotted for contrasting V2 '*t'o 
study anticipatory coart iculation and for contrasting VI to study carryover 
coarticulation., All VCV sequences except [ijyj and [Vji] were lined up 
according to PMC. For [ijV] sequences, in whijeh the period of maximum contact 
lasts for several frames, the onset of the period of maximum contact was taken 
as the line-up point in measuring anticipatory effects; for [Vji] sequences, 
for the same reasons, the offset of the period of maximum contact was taken as 
the line-up point in measuring carryover ooarticulation. Coartflculation was 
considered to occur when an observablf / dif ference between two vowels in front- 
ing/ of linguopalatal contact caused an analogous difference to occur ori the 
other side of the line-up point and'' such difference was .found to be signif- 
icant at some moment in .time/ . Since, the main concern was to measure the 
correlation between degree of, 7 tongue-dorsum cpntact 'and degree of 
transconsonantal ooarticulation^ only data from, rows 3 'and 5 were selected. for 
analysis in view of the fact t^at those rows show contact at the palatal re- 
gion'exclusively'. The analysis, procedure chosen to study V-to-V coarticulato- 
ry effects is described below. 

Figure 6 shows anticipatory .effect3 for [ujCV] (top, two upper panels) 
and carryover effects for [VAu] (bottom, two upper panels) on rows 3 and 5 at 
the right side of the palate. For the anticipatory condition, differences in 
linguopalatal fronting can 9 be observed during V2 on rows 3 and 5 as 
[i]>[u]>[a]. - On row 3 such differences cancel out during the period of maxi- 
mum contact but appear .between V1 onset and 15 ms before PMC as V2- [i]>[u], 
[a]; two-tailed _t-tests show that anticipatory effects for V2= [i]>[u] (but 
,not for V2» [i]>[a])'are significant (p< .05) between -60 and -30 ms. No sig- 
nificant anticipatory effects occur on row 5« 

For the carryover condition, differences in linguopalatal fronting can be 
observed during V1 * on row3 3 and 5 as [i]>[u]>[a]. Differences cancel out 
during the period of maximum contact (on row 3 but not on row 5) but appear 
during V2, (as V1- [i], [a]>[u] on row 3 and as VI- [i]>[u]>[a] on row 5). 
They were found to be significant for V1=[i]>[u] on rows 3 and s (p< .01) and 
for VI- [i]>[a] on row 5 (p<,.001) between PMC and V2 offset. 

This procedure was used to analyze effects between all possible VCV pairs 
i*n all coarticulatory conditions for all consonants reported in this study. 
Any possible coarticulatory effect on rows 3 and 5 on both sides of the 
palate, as determined visually from the plottings for all contextual combina- 
tions of VCV pairs, was tested frame by frame by means of the ^t-test proce- 
dure. Transconsonantal anticipatory effects (Table 3) and carryover effects 
(Table *0 are reported for all consonants in all vocalic environments. For 
each VC context (Table 3) and CV context (Table >0» the size of the signif- 
icant coarticulatory effects and the onset and offset times of such effects 
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Figure 6. Top: EPG data (two upper panels) y and f'2 data (lower panel) on 
j** ant icipatory effects over time for se*H*?TTCe3 [u£V] lined up at 
* PMC. Bottom: EPG data (two upper panels) and F2 data (lower pan- 
el) nn carryover effects over time for sequences [VvCu] lirted up at 
PMC. Data correspond to speaker Re; EPG data correspond to rows 3 
-•no 5 on the right 3ida of the palate. 
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rt 

Table '3 ' • • »' 

* 

*EPG data on significant anticipatory (V2-to-V1) effects for different degrees 
of tongue-dorsu.il fronting. The magnitude, of the effects is plotted for 
different levels of significance (*p<.05; **p< . 01 ^ ***p<.001 J . Onset of 
coarticulation (in ms before PMC) occurs at the number preceding "/".or at VT 
onset if *no y number is present; offset of coarticulation (in ms before *PMC) 
occurs at the" number following n /V or at PMC if no -number is present. 
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EPG data 'on significant carryover (V1-to-V2) effects for different degrees of 
•tongue-dorsum frontipg. The magnitude of th^effects is plotted for different 
levels of significance (*p<.05;' **p<.01; ***p<.001). : Onset of 
coarticulation (in ms after PMC) occurs at the number preceding "/" or at PMQ 
if no numtfer is present; offset of coarticulation (in ms after PMC) occurs a£ 
trie number following "/" or at V2 offset if no number is present. 
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V 

are given for pairs of V2 (Table 3) and VI (Table U) differing in degree of 
linguopalatal fronting.' The magnitude of the significant* coarticulatory ef- 
fects- is plotted for different levels of significance (* p<.05; ** p<.01; *** 
p<.001) ; 'data correspond to the largest significant effects on rows 3 and .5 on 
any of the two sides of the palate. Onset and offset times of the significant 
coarticulatory effects are plotted in ms. The number of mar before V" indi- 
cates onset time (as for -60 in the -case of [u/Ci] vs. [UiCu]) and the number 
of ms after "/" indicates offset time (as*for -30 in the case of the same pair, 
of sequences). No number before »/» indicates onset of coarticulation at VI 
onset (anticipatory effects) and at PMC (carryover effects), .and no number 
after "/" indicates offset of coarticulation at PMC (anticipatory effects) and 
at V2 offset (carryover effects). 



ERIC 



a. Anticipatory effects (Table 21- Data 'show that, as expected, .the in- 
stances and the degree of significance of the anticipatory effects decrease 
for [n]>[J] and for [ n] > [X ] > CP ] as tongue-dorsum contact increases. However, 
[J? shows more contact and more (not less, as expected) coarticulation than 
Ln] and U] . According to Table 3, large anticipatory effects for [J] occur 
for [aCV] and [uCV] when V2- [i] vs. [a], [u]; given that [i] and [J] are pro- 
duced with a highly similar articulatory configuration, it could be that, dur- 
ing- V1 , the speaker shows greater fronting for V2» [i] vs. [a], [u] to make 
sure that the upcoming consonant will be produced with a very salient gesture 
so that it can be articulatorily and Derceptually distinguishable from V2-[iJ. 
Data on F2 (see section IIB.2a) suggest strongly that the sar effects occur 
for this and other speakers when [j] is preceded b, V1-[i]. This strategy 
supports the view that the process of anticipatory coarticulation is regulated 
by articulatory preprogramming. ^ • 

Onset time of the anticipatory effects occurs' always at V1 onset for con- 
sonants that involve large tongue-dorsum contact ([j] and Lp] > and ap differ- 
ent moments in time before PMC (at VI onset, -60 ns and -45 ms)' for Consonants 
that involve less tongue-dorsum contact ([n] and, to a lesser extent, UJ). 
Offset time values .show that anticipatory effects can cancel out about the pe- 
riod of closure for articulations that show large tongue-dorsum contact (at 30 
ms before PMC for [jf, Lr>] and Uh at the onset of V1 = [i] about -140 ms) but 
not for articulations that show small tongue-dorsum contact (Lnj). 

Overall, .it appears that consonants that involve low 'requirements on 
tongue-dorsum activity (as for [n]) show a small degree of tongue-dorsum con- 
tact and allow large transconsonanta 1 anticipatory effects with different on- 
set times before PMC. On the other hand, as tongue-dorsum contact increases 
(as for [A] and Lp ] but not for [jj, for reasons stated above), consonants al- 
low small transconsonantal anticipatory effects with a fixed onset time at VI 
onset. 

\ 

These coarticulatory phenomena suggest that the magnitude and the tempo- \ 
ral extent of transconsonantal anticipatory coarticulation in VCV sequences is ~ 
controlled by anticipatory preprogramming with reference to the mechanical 
constraints on articulatory activity required for the production of the conso- 
nant. 

b Carryover effects (Table 'J ) . The instances, and the degree of signif- 
icance* o?~Tn7^rfy^v^"effe7ti decrease (for [ n]>'[iO> Cj0>[ J] > inversely and 
monotonically with the degree of tongue-dorsum contact (for [ j]>[ < r«J>UJ>[nj ). 

1% 
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• 

For §11 consonants (except for.'Cj], for reasons* indicated in section IIB.Ia), 
carryover effects are larger than anticipatory effects; also, t while • anticipa- 
tory and carryover effects* occur from front vs. back vowels, carryover effects 
ampng back vowels are much larger than anticipatory effects. * ^ 

Contrary to ^anticipatory coarticulat ion, no contrasting timing effects 
are found among different consonants; thus, carryover coarticulation extends 
generally from PMC up to V2 'offset. 

These coarticulatory phenomena suggest that transcpnsonantai carryover 
coarticulation in VCV sequences results from mechanical inertia 'constraints on 
articulatory activity required for the production of the consonant and in- 
volves no articulatory programming. 

2 . Acoustic Measurements % * - * 

F2 trajectories for pairs of VCV sequences fo'c speakers ■ Re, Bo and Ca 
were lined up .according to the same procedure use'd to study coarticulatory" ef- 
fects for EPG data. Carryover effects for [j] could be measured only for 
speaker Re since no reference point was available for line-up procedures for 
speakers Bo .and Ca^ To detect transconsonantal effects, a procedure analogous 
to that for EPG data wa3 used; thus, effects were considered to occur when 
Qbservable frequency d-ifferences between two vowels caused analogous differ- 
ences to occtir et some momen£ in time on the other side of the line-up point. 
This method of analysis is exemplified below. ■ * 

• * 

Figure 6 shows anticipatory effects for [ujCV] (tap, lower panel) and 
carryover effects for [V/Cu] (bottom, lower panel). Data correspond to speak- 
er Re and have been lined up at PMC with EPG data for the <eame speaker. For 
the anticipatory Condition, differences in F2 frequency can be observed during 
V2 as [i]>[a]>Cu]. Anticipatory effects for V2= [i]>[a] and V2= [i]>[u] occur 
between -15 ms and PMC; they never exceed 250 Hz # No differences for V2» 
[a]>[u] take place before PMC. 

For the carryover condition, differences in F2 frequency can be observed 
during VI as [i]>[a]>[u] . Carryover effects for V1» [i]>[a] and V1- [i]>[u] 
occur between PMC and V2 offset;; the largest magnitude for the two effects is 
found between 250 and* 500 Hz. No effects take place after PMC for VI- 
[a]>[uj. 

V 

Coarticulatory effects on F2 f as determined visually'from the plottings 
for all contextual combinations of VCV pairs, are reported on Tables 5 
(anticipatory* effects) and 6 (carryover effects) analogously to Tables 3 and 
4. The magnitude of the coarticulatory effects is plotted for different mag- 
nitude levels (* 0-250 Hz; ** 250-500 Hz; ***' more than 500 Hz); onset and 
offset times are reported in ms. 

a. Ant>tflpatory effects (Table 5_) . Similarly to the EPG data, the magnitude 
of uie coarticulatory effects increases slightly as tongue-dorsum decreases 
for [$0>[iC]>[n] , while [j] shows similar effects to As for the EPG data, 

speaker Re (but not speakers Bo and Ca) -shows large anticipatory effects for 
[VjiJ^vs. [Vj^3, [Vju] when V1= [a] and [u] as a result of tongue-dorsum 
reinforcement before PMC. Moreover, the same strategy is used by all speakers 
for all palatal consonants when \nv[i], including speaker Re who did not show 
anticipatory effects during the articUlation of this V1 on the surface of the 
palate. ^ 
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Table 5 ** ' * *' 

o 

. % 

Acoustical data on significant anticipatory (V2-to-V1) effects for differences 
,in F2 frequepcy. The magnitude -of the effects is plotted for different 
'magnitude lev.els ^*O.-250 Hz; **250-5Q0 Hz; ***more than 500 Hz). Temporal 
effects are indicated as for the EP£ data (see Table 3). 
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. * V . ; • • 

Timing effects can be explained analogously to the EPG cfeta. For ' conso- 
nants involving a large degree of .tongue-dorsum contact ([J] and [ji]), ons,et 
titfie of the anticipatory effects occurs at V1 onset; for consonants involving 
less tongue-dorsum contact 1 <[n] and, to a large extent, [rt]), onset of antici- 
patory coarticulation occurs at different moments ^ in time before PMC (mainly, 
at V1 onset and 'between -55 and --25 ms). Contrary to the EPG data, as for 
[n], anticipatory effects for palatal consonants last until PMC ? , 

•> « 

Overall, anticipatory effects on F2 are in agreement with 'anticipatory 
effects* over the surface -of the palate (see section IIB.Ia) and, therefore, 
are .due, to articulatory preprogramming. /In some instances', different 
coarticulatory trends *take place in different. regions of the 'vocal tract for 
the same speaker; thus/^ artioulation3 that 'require large * \legrees of 
tongue-dorsum contact block coarticulatory effects Stout the period of closure 
at the surface of the palate bat not at other regions of the vocal t^fifipt. 

<• ' •* 

b. Carryover effects (Table 6). As^or the EPC data, the magnitude of 
the coarticulatory effects decreases '( for [«3>C£l>[jO>[ J]) inversely and mono- 
tonicaliy with the degrwe of tongue-dorsum bontact (for [ j]>[n]>[^]>[nj ). As 
for the EPG data, for~"all consonants (except for [jj, for reasons indicated in 
section IIB.2a), carryover effects are larger ihan anticipatory .effects.; also, 
while effects from front vs. back' vowels occur in the anticipatory and .carryo- 
ver conditions, carryover effects for [n] are much larger than anticipatory 
effects. 

There is some ^indication that, in the carryover condition, timing- may be 
controlled with reference to the degree of tongue-dorsum constraint required 
for the- production of the consonant. Thus, carryover effects <end at different 
moments in time after PMC for consonants produced with small degrees of 
tongue-dorsum contact ([n] and, to a lesser extent, [£]) and last until V2 
offset for" consonants produced with large degrees of tongue-dor siim contact 
(Cnl). However', consistent to EPG data on carryover coarticulation, speaker 
Re does not show this trend with respect to coarticulatory effects common to 
all- the consonants. Onset of carryover effects at +65/+80 ms • "after PMp fqr 
[n] results from the cancellation of differences in F2 frequency for V1-[i] 
vs. [u] during ..closure due to oronasal coupling. • 

Overall, carryover effects on F2 are in agreement with carryover effects 
on the surface of the palate (see section IIB.l b) ; they are due to mechanical 
Inertia constraints on articulatory activity and involve little or no articu- 
latory programming. 

III. Summary and Conclusions 

The articulatory mechanisms that underlie the relationship between de- 
grees of tongue-dorsum contact and degrees of coarticulatory activity are dis- 
cussed. The dorsopalatal [ j] is produced with one articulatory command ♦ for 
the raising of the tongue dorsum while the tongue blade makes no contact over 
the palate. For alveolo-palatals, two commands are actualized, simultaneously: 
£ongue-b lad's occlusion and tongue-dorsum raising. As a ^result of this 
synergistic activity, a large degree of contact is obtained for alveolo-pala- 
tals over the entire surface of the palate. Thus, the production of [j] 
vs. alveolo-palatals results in more tongue-dorsum .constraint since all muscu- 
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' Table 6 

Acoustical' data on significant carryover (V1-to-V2) effects for differences in 
^frequency. The magnitude of the effects is plotted' for different magnitude 
levels (*Q-25'OHz; **250-500 Hz; ***more than 500 Hz). Temporal effects are 
indicated as for th^ EPG data (see Table 4). 
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lar t activity is directed exclusively towards tongue-dcrsuni 'raising ; Cor .aiVeo-* 
,lo-palatals, >on the other band ^-1^ 
fact that muscular activity ^ 

tongue-blade activity to make conUct at the, the/ palatal 'surf aCe v 

(more so ' Cor than for tji) ) . • Alveolar { n J, is produced with a command to 
the tongue tip: against the alveolar region and involves no constraint "ort" the 
tongue dorsum to achieve dorsopaiatai ' contact* . in fact \ contact with the 
tongue dorsum for [n J occurs only at the sides of th*e paiat^. fData on the de- 7 
gree of ♦linguopalatal fronting and F2 frequency reported in this study show ~. 
that V-to-V'coart legatory effects for ".£J3, Cj»3> t*3* and [n] can be predicted H 
from these differences in constraint on the tongue dorsum tc* achieve dorsopa- 
iatai contaot. Thus, effects have been found* to vary mondtonicallv ,and 
inversely as a function oT, such differences,- in tongue-dorsum constraint.* c - 
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Trartsconsonantal effects extended. back to V1 onset (anticipatory effects) 
and . up to V2 offset (carryover ^effects), more so on the surface of the palate 
(Q0% of anticipatory and>*carryover effects. for the EPG data) than at' otner Re- 
gions of the vocal tract (60^of anticipatory and carryover effects for the P2 
data). There is evidence from' the literature that acoustical measurements can 
be less sensitive than articulatory measurements to coarf ieulatory effects 
(Gay, 197**, .1977). ' ' 

, Directionality of coartidulatory effects has been taken^to\ be inherent in 
the programming of speech sequences (Kent, 1976). Several findings reported § 
in this study speak to this issue.* Carryover effects were founa b6 be larger «. • . 
than anticipatory effects in the light oT articulatory and acoua,tipal data for . ' 
most consonants^ and speakers. From the present s,tudy, it can ; | be concluded • 
that this finding refJ.eots a language-spec if io property of, how articulatory 
programming is organized in Catalan. /Evidence for a similar trend has. been ; <f- 
found for English (Bell-Bert i & Harris. 1976; Gay,. 1974; MacNeilage &,DeClerk, ■ •> 

,969> -- " : ■ . ■ . v. j j • • . \ : f 

The temporal extent of coarticuiation was- found to vary withy difference! -a 
in tongue-dorsum 'contact, much ♦ more so for, anticipatory effects than for*" . -| 
carryover effects. Thus, EPG data and acoustical data show that? onset time of # 
anticipatory coarticuiation is determined with higher precision (at V1 onset ' 1 ; 
vs. different times before PMC) as the degree of tongue-dorsum contaot for the 
consonant Increases; on the other hand, EPG data and, lass so, acoustical data • ; 

show that carryover effects extend up to V2 offset independent of the degree . . • 
of tongue-dorsum contact for the consonant. This finding is consistent wjth .** 
the view that anticipatory coarticuiation results from articulatory, prepro- 1 ; 

gramming which, for the set of consonants investigated in this stujjy, operates * . \, 
.with reference to the mechanical constraints involved during the production of ■« I- 

the consonant/ Preprogramming allio results in the .reinforcement of 
tongue-dGrsum activity during V1 to anticipate the articulatory and perceptual 
differentiation of a palatal consonant followed *by V2«[iJ vs. Da], [uj. 



Data reported in this study support the view that the speech production 
mechanism involves independent . control of different regijbns df the vocal 
tract. Thus, anticipatory ^effects on the surface of the palate (EPG data) but 
not at other regions of the Vocal tract (F2 data) are blocked about the period 
of closure for articulations that involve large, tongue-dorsum contact. 
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With respect to alternative ',m6dela that 'have been proposed in the 
literature to explain coartieulation in VCV utterances, (syllabic or V-to-C 
model an'd cross-syllabic or V-to-V models see .Gay, 1978, for discussion), data 
repprted here' show that coartieulation is a context-dependent process, in the 
sense that the extent to which V-to-V effects occur depends on' the -articulato- 
ry mechanisms 'involved in the production bf the entire* VCV sequence. The 
finding that' coartieulation can be largely, predicted from phe degree of con- 
straint involved during the activity of specif ic articulators suggests that 
the process of speech production is organized around precisely controlled pat- 
terns of articulatory activity. Thus, it, has been shown that, independent of 
whether the. primary constriction takes place at the palatal and/or alveolar 1 
region's, the degree of tongu8-dor sura* contact needs to be specified with accu- 
racy. These^obser vat ions are consistent with the view that, linguistic units, 
are actualized in running speech by. muscle groupings that are synergistically 
controlled. * 
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CATElWRICAljJ-TRENDS IN V<3wISL! IMITATION: PRELIMINARY OBSERVATIONS FROM. A j 
REPLICmtfN, EXPERIMENT* •.."* '** * ,\ ' 



Bruno H. Repp and David R.'Williamst 



Abstract . The question . whether isolated stationary vowels are 
. imitated in a continuous or categorical fashion<r-raiaed by Chi s tor 
vich, Fant, de Serpa-Leitao, and Tjernlund (196;, and followed up 
primarily by Kent (1973)~was pursued further by replicating Kent's 
study with some alight modifications. The subjects (the two 
authors) imitated synthetic stimuli from [u]~[i] and [i]-[ae] con* 
tinua at three different temporal delays. Acoustic analysis of the 
response vowels revealed very similar patterns across delays. Both 
, subjects showed .olear evidence of nonlinear! ties in the st imulua-re- 
spo'nse' mapping and of preferred response^f ormaht frequencies, though 
strictly categorical responses were generally absent.- The origin' of , 
these nonlinear! ties and their relation t^ the phonemic .vc-wel cate- 
gories /of English are not fully understood at present. Vowel imita- 
tion responses presumably- reflect the joint influences of perceptual 
and articulatory factors that need to be disentangled in future re- 
search. * * 

r 

'~ . 1 . Introduction 

i « 

Almost two decades ago, *Chistovich and. her colleagues (Chistdvich, Fant, 
de Serpa-Leitao, & Tjernlund, 1966) raised the interesting possibility that 
.(isolated, stationary) vowels might have an internal Representation intermedi- 
ate between the auditory pattern, which presumably is a continudus^function of 
the 'input, and the -discrete categories of the vowel phonemes \in the language. 
To warrant a separate existence in a theoretical model of speech processing, 
such an intermediate level of representation must have a noncontinuous struc- 
ture distinct from that found at the phonemic level. 

To test for tnis possible intermediate level, Chistovfcch, r ^nt, de Ser- 
pa-Leitao, and Tjernlund (1.966) used a vowel imitation task. There is reason 
to believe that' oral reproduction, particularly when it occurs' as rapidly as 
oossible ("shadowing"), may bypass the level at which familiar phonemic cate- 
gories are associated with the input. This may be so even when the imitation 
response occurs at some delay ("mimicking"): 1 , Latencies of shadowing 'and 
mimicking responses do not increase for phonemic&lly ambiguous vowel tokens, 
but the latencies of written responses do (Chisrovieh, .jie Serpa-Leitao, & 
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Tjemlund,- J9j66). % -The -increase presumably Reflects the tim& needed to decide u 
among several categorical response alternatives. The hypothetical intermedi- 
ate stage thus may*seQ»e to translate auditory information directly into motor 
instructions , as well as to store such information over %ime periods exceeding . 
the life span of the raw auditory trace. Indeed, this intermediate refpreaen~; 
tation maybe-thought of as .motor in nature, thus "anchoring infirmly in. one 
of the three essential components of any model of spee.ch communication '(senso- 
ry, motor, and central processing). The question posed by Chistoyich et 
al. thus concerns 1 the continuous versus . discrete motor\ 'representation pi 

(isolated, stationary) vowels. ,< ■ , 

i • • /* u — 

To answer this question, Chistovi'ch, Fant, de Serpa-LeitSo, and Tjernluttd 
(1966) constructed a series of 1-2 synthetic vowel s*-' luli forming a -continuum 
between the Russian vowel categories [iJ-CeMa], The formant values of .the 
endpoint stimuli wer^ closely' matched to natural productions of the single 
subject (L.C. )\ a' female speaker. of Russian. Her imitation responses ih two 
conditions (shadowing— average, latency of '.90 ras ; mi mip king— average late^cy^ 
*of 900 ms) were analyzed in three ways: (1) T*e formant frequencies (F1 , F2, 
F3) of the responses! Obtained from spectrograms, were plotted as a function 
of the formant frequencies of the. synthetic stimuli. (2) The standard devia- 
tions of the formant frequencies 'across teultiplLe imitations of .the same stimu- 
lus were plotted 'in- a similar fashic^. (3) histograms of the frequencies of , 
each formant across all responses were constructed. 

For the >imicking task, the formant frequency plots repealed stepwise 
changes ih • F/2 suggesting at least four distinct categories, this impression* 
was supported by the presence of three peaks in the standard deviation plot,, 
corresponding to the boundaries between those categories. Finally, the F2 
histograms also seemed to represent four distinct distributions of frequency 
values. The shadowing data seemed to agree, although only the first type of 
analysis was presented and some additional arguments were required to; estab- 
lish the similarity. 'Phonetic transcription^, by the (sophisticated) subject 
of her own responses also suggested four categories— corrTesponding to /jl/, /zf 
(or /e/), /oV, and /a/— and possibly a fifth category (/.I/). Further 'support 
for this division came from a vowel matching task that required loag-term mem- 
ory for a fixed' standard, using subject L.C. and the same set of, stimuli. 
Chistovich et al. concluded that there- is a discrete representation of,. vowels 
in terms of categories whose number exceeds that of the relevant phonemic cat- 
egories in the language,\and that this representation is used both.to gufde 
articulation and, to retain \owel sounds in lOhg-term memory. 

Although these results are suggestive and challenging, • they are not with- 
out weaknesses; (1) There was convincing evidence for only one category (/«/) 
beyond the ihr*e phonemic categories that, a priori , , might have been expected 
to play 'a role. That extra vowel category,- moreover, is functional in -Swed- 
ish, the language cf the country where Chistovich et al. conducted their 
study. Thus, the results are not inconsistent with a two-stage model, in 
which the outcome of a rapid phonemic dec^ion (prior to the 'stage of response 
selection) constrains the motor program for imitation, without' any intermedi- 
ate stage. (2) The conclusions are based entirely on the pattern of F2 Tre- 
querfcies. Although there appeared to be some correlated trends in F1 and F3, 
these data were not discussed Dy Chistovich et al. (3) The analysis of 
shadowing responses is incomplete, and fidt totally conclusive. (*) All the 
analyses are qualitative; no statistical criterio: for, e.g., detecting steps 
in a function was specified. (5) There was only a single, highly sophisticat- 
ed subject. _ 
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Some preliminary follow-up data were reported by Chistovich f Fant, and de 
Serpa-LeitSo (1966r. Several subjects (including L.C.) imitated noise-excited 
synthetic two-formant vowels lying along arbitrary trajectories in the F1-F2 
space. Again, some step-wise changes in response formant frequencies were ob- 
served, but the^data show some remarkable irregularities and are suggestive at 
best* Some of the stimuli may have exceeded the range of formant values sub- 
jects were able to produce. < 

A full-scale replication, with some modiflcftions, was attempted by Kent 
(1973). He used two 11 -member vowel continua, ranging from [u] to [i] and [i] 
to [ae], respectively* The formant values of the endpoint stimuli were de- 
rived from Peterson and Barney (1952). Four English-speaking subjects who had 
some phonetic training imitated each vowel 10 times. There was no time pres- 
sure; subjects had up to 7 s to respond. 

The [i]-[ae] continuum roughly corresponds to two-thirds of the [i]-[a] 
continuum used by Chistovich, Fant, de Serpa-Leit3o, and Tjernlund (1966). As 
Kent poi-nts out, this continuum is "phonemically rich" in English, spanning a3 
many as five categories (/i/ f /I/, /e/, /e/, /ae/). The [^u]-[i] continuum, on 
the other hand, contains no familiar vowel categories between the endpoints, 
despite covering a much larger range of F2 frequencies (though with F1 nearly 
constant), Th3 question of interest, then, was whether categorical tendencies 
in vowel t imitations-rif replicated — would be restricted to the [i]-[as] con- 
tinuum. 

Kent, interpreted his data as providing an affirmative answer , to this 
question. , Response F2 frequencies along the continuum, while not ^ 

strict.ly linear function of stimulus F2, did not exhibit any clear steps for 
any of the four subjects. The standard deviations, however, showed pronounced 
peaks — a single central peak for subjects 1 and 3, and twin peaks for subjects 
2 and 4. Moreover, the F2 histograms tended to have several modes for each 
subject — two for subjects 1 and ,3, and three for subjects 2 and 4, Finally, 
comparison of these patterns with the F2 frequency plots shows that increased 
standard deviations and histogram valleys correspond to regions of increased 
slope on the F2 frequency plots. Kent is very conservative in interpreting 
these data, conceding only that responses are ftiore accurate at the [u] and [i] 
ends of the continuum. In fact, his data offer some support for the presence 
of three categories in two of the four subjects. It is true, however, that 
these categories correspond only to regions of reduced discrimination, never 
to a true constancy in imitation. No labeling data were collected for these 
stimuli, nor was F3 analyzed. 

The results for the [ij-[ae] cor tinuum were presented by Kent in an 
incomplete and somewhat confusing manner, which makes it difficult to object 
to his conclusion that these data were hard to interpret. The response loca- 
tions in F1-F2 space (F3 was not analyzed) showed some pronounced 
discontinuities, but they occurred in different places for different subjects 
and agreed only pattially with the patterns of standard deviations and fre- 
quency histograms* For only one subject could histogram peaks be tentatively 
aligned with the response categories used in labeling the [i]-[ae] stimuli. 
From the3e results, Kent drew the "very tentative conclusion that the stimuli 
of the /i/-/ae/ series are represented in memory in a more classif icatory 
fashion than are the stimuli of the /u/-/i/ series" (Kent, 1973, p. 16). 'The 
emphasis on memory Is explained by the lack of time constraints in Kent's 
task . 
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The- available data on categorical tendencies in vowlI imitation must 
therefore be considered as merely suggestive., Both Kent (1 973) in his 
concluding paragraph and, quite recently, Chistovich (J984) acknowledge that 
further data on this issue ares needed. Their initial studies do ,not seem to % 
have been followed up. 2 Kent went on to conduct a series of interesting 
vowel imitation studies, but none of these addresses directly the issue of 
categorical tendencies (Kent, . 1974, 1978, 1979; Kent & Forner, 1979). A rel- 
evant study by Schouten (1977), in which stimuli from an [i]-[ae] continuum 
'weri presented to Dutch-English bilingual subjects, yielded some evidence of 
categorical imitation, but the data were complex and were analyzed by quite 
different procedures, such as cluster ( analysis. Thus, the ambiguity of the 
earlier results remains. 

We decided to continue where Kent (1 973) left off, and to begin with a 
straightforward replication of his study. In this preliminary report we pre- 
sent data obtained for the two authors as subjects. There were a few methodo- 
logical differences between our study and Kent's. The most important of these 
was that, we employed three different response timing conditions, in an attempt 
to incorporate t ie' "shadowing" versus "mimicking" comparison (Chistovich, 
Fant, de Serpa-Leitao, & Tjernlund, 1966) into Kent's design. Subjects were 
required to imitate each vowel (a) immediately, (b) following a short (750 ms) 
delay, and .(c) following a longer (3 a) delay. Other procedural changes were 
relatively minor: First, the stimuli" were presented over a loudspeaker rather 
than over earphones. Second, to prevent overlap of shadowing respo^ses^th 
the end of the stimulus, the synthetic vowels were shortened to 150 ms (versus^. 
•250 ms [Kent, 19731 and 300 ms [Chistovich, Fant, de Serpa-Leitao,' & Tjern- f 
lund, 1966]). Extrapolating from perceptual findings (Pisoni, 1973) shorten-* 
ing 'of vowel stimuli should, if anything, result in a more categorical re- 
sponse,. Finally, there were 12 (rather than Kent's 11) stimuli per vowel/ con- 

« * 

tinuum. 



2. Methods 



2.1. Subjects 



The two authors served as subjects in this pilot study. BR is a native 
speakpr of Southern German who has been speaking Engl-^h almost exclusively 
for over 15 years. He -also has smatterings of French, Itjgifan, and Swedish, 
but no professional phonetic training. DW is a native speaker of American En- 
glish from California. Although not a fluent speaker of any foreign 
languages, he has studied French, German, Japanese, and Latin, and has had 
several years of phonetic training. 



2...\ Stimuli 



Five-formant vowel stimuli were generated on the software serial 
■wnth.*sizpr at Haskins Laboratories. The center frequencies of the fourth and 
fifth formants were fixed at 3500 and 4500 Hz, respectively. All stimuli were 
150 m in duration and all formants were stationary with amplitude rise-fall 
turn's cf 30 ms. Bandwidths of the five formants were set at 50, 80, 110, 150, 
.r-i'-.O H,'. , respectively. For all stimuli, the fundamental frequency fell 
fin-Mrly from an initial value of 120 Hz to a final value of 105 Hz. 
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Two stimulus contini i — one from [u] to [i] and the other from [i] to 
[ae] — were employed in the experiment. Frequency values for F1 , F2 f and F3 of 
the endpoint stimulus vowels corresponded to the mean values reported , by 
Peterson and Barney (1952) for their male talkers. Two 12-stimulus vowel 'se- 
ries were constructed by interpolating in equai frequency steps between the 
endpoint values for the first three formants* The formant frequency values of 
all stimuli are listed in Table 1 ♦ - Twelve randomized blocks of the twelve 
stimuli were arranged to form a stimulus sequence. The order of items in the 
sequence was adjusted until it was almost perfectly balanced; i.e., each 
stimulus was preceded once by every other stimulus, with a few exceptions. For 
each vowel continuum, three such stimulus sequences were recorded on tape for 
presentation in the imitation task. 

i 



Table 1 





Formant 


Frequencies of the Stimuli 

✓ 








Lu]-[iJ continuum 






F1 


F2 


F3 


1 


300 


<e 

870 


2240 


2 


297 


999 


231 0 


3 


295 


1 1 28 


2380 


4 


292 


1257 


2450 


5 


289 


1386 


2520 


6 


286 


1515 


v 2590 


7 


284 


1644 




8 


281 


1773 


2730 


9 


278 


1902 


2800 


10 


276 


. 2031 


2870 


1 1 


273 


2161 


2940 


12 


?70 


• . - 2290 


3010 






[i]-[ae] continuum 






F1 


F2 


F3 


1 


270 


2290 


301 0 




3 Ob 


22 3 8 


2955 


> 


341 


21 86 . 


2901 




5.76" 


21 34 


2846 


k ) 


H12 


2082 


2792 


' ) 


M 14 Y 


2030 


27 57 


'1 


48 \ 


1 979 


2b 8 3 


rt 


hl8 


1 927 


2628 




■>s4 


' 1876 


2574 


i . } 




1 824 


,:'>}') 


i i 


625 


1772 


74f»'. 




f.f.(j 


1 720 


1 ■) 
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2.3. 'Imitation Task 

Each of the '144 trials in a stimulus 'sequence required six seconds. f 
Within that interval, the Subject was presented with a 150 ms stimulus vowel • 
that was either preceded or followed by a 1000 Hz tor l|J\100 ms in duration. 
The position of the tone in a trial depended on the tjMflK^"* In the lmmedl " 
ate imitation condition (A), a trial began with t.hpffco^P^hich served as a 
warning 3ignal; its onset preceded that of the Stimulus vowel by 500 ms. 
Subjects were asked to produce a response vowel as soon .as possible after the ^ 
onset of the stimulus vowel. In the delayed (B) and deferred (C) imitation 
conditions, a trial began with a stimulus vowel, the onset of which preceded 
that of the tone by 750 ms an 3000 ms, f respectively.' Subjects were' asked in 
these conditions to respond immediately upon the onset of the tone, which 
served as a "go" signal here... All responses were initiated from a closed 
mouth position; that is, phonation was always preceded by a silent opening 
gesture. 

Each of the three stimulus sequences was divided into three' parts of 48 
trials each, separated by pauses and assigned to the three imitation condi- 
tions as follows: (1) A, B, C; (2) B, C, A; (3) C, A,B. Before hearing 
the stimulus sequences, subjects listened to one .1 2-i^em block of .trials from . • 
each of the imitation conditions for practice. The [u]-[i] session preceded 
the [i]-[ae] session for both subjects. 

{ 

The stimuli were presented in a sound-insulated (IAC) booth over a Real- 
istic loudspeaker placed about 20 degrees to the right 'of center at a distance 
of 3 feet -from the subject. t A Sennheiser MKH415T microphone was placed 
directly in front of the subject approximately eighteen inches from his lips. 
Both the stimuli and the subject's responses were recorded on a second tape 
recorder. • » 

2.4. Identification Tasks 

In addition to the imitation responses, each subject provided written 
phonemic and numeric identifications of the stimuli. For ttoe phonemic identi- 
fication task, the stimulus vowels from the [i]-[ae] series w re recorded 
twelve times in random sequence with ISIs of .3 s. The subjects used the 
label3 /i, I,e,£,ce/. This identification task followed the imitation task by 
several weeks. " Phonemic identification of the [u]-[i] stimuli was not 
attempted, since both authors found it difficult to think of appr> priate re- 
sponse categories. 

f 

Several weeks later, the subjects identified the stimuli from both con- 
tinua on a numerical scale ranging from 1 to 12. To facilitate this absolute 
Identification task, the stimuli were presented in an AXB format, with the A 
and B stimuli representing the continuum endpoints, in eitht" order. A total 
of TV x 12 = 144 triads were presented for each continuum, with ISIs of 1 s 
within triads and 3 $ between triads. 

Each identification task was precede' by a familiarization sequence in 
which th» stimulus series was presented four * imes, twice in forward order and 
twice in r verse order. 
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2.5. Analysis '. j" 

The imitation recordings -were digitized at a sampling rate of 10 kHz. 
With a waveform, editor, the time ' interval between the onset. of the warning 
tone and the onset of the response vowel was measured. Each response vowel 
was then isolated, labeled, and stored in a disk file. The center frequencies 
of the first* three sptectral peaks were obtained using «LPC analysis, with a 
,20-ms window moving in 10-ms steps. For. each formant, three separate esti- 
mates were obtained: (a) the mean formant frequency over the 'whole vowel, 
including its .standard deviation across the vowel* token; (b) its value at the 
onset of the response yowel; (c) its value two-thirds iato the vowel (the 
measurement point used by both Chistovich, Fant, de Serpa-Leit§o, and Tjern- 
lund [1966] and Kent [1973]). These values were further averaged across the 
12 responses to each stimulus token, anoFthe standard deviation for each stim- 
ulus token was computed. Formant frequency histograms for each delay condi- 
tion and for all delay conditions combined were obtained for each stimulus in 
a series using Program 5D of* the BMDP package. 

3". Results and Discussion 

3.1. Latencies 

• 4 

Chistovich, Fant, de Serpa-Leitao, and Tjernlund (1966) reported essen- 
tially constant response latencies for subject L.C. across their vowel contin- 
uum, both • in the rapid shadowing ^condition (average latency: 190 ms) and in 
the slower mimicking condition . (coverage latency: 900 ms). Kent (1973) did 
not measure reaction times. The average latencies for the vowels along the 
two continua used in the present study are plotted in Figure L as a function 
of stimulus number , f separately for the two subjects and for the three imita- 
tion conditions. Although there is more variability here than in the data of 
Chistovich et al.,.the functions arfe either flat or vary in a nonsystemat ic 
fashion. The standard deviations were rather large — at^but 60 mg on the aver- 
age — which must be taken into account when interpret ing^the latency functions; 
Despite occasional peaks and valleys, these function's do not provide any 
strong evidence of systematic variation in latency across the vowel continua. 
Thus, the conclusion of Chistovich et al. that imitation bypasses conscious 
response selection, regardless of delay, is not contradicted. 

Some striking differences among conditions are evident in Figure 1. Both 
subjects showed consistently longer reaction times for immediate than for de- 
layed ( + 750 ms) imitation* (Note that the delayed-imi tat ion latencies were 
measured from the onset of the "go" signal; to obtain stimulus-response 
1 i^ene ies, 750 ms ,nust be added to the times shown.) This is not unexpected: 
After all, the subjects knew exactly what to produce when the "go" signal oc- 
curred in the delayed condition, whereas in ttye immediate condition bhey had 
n**» such advance knowledge, and therefore had to wait until at least part of 
t.ri»* stimulus vowel hail been processed. It should also be noted that the 
i iV'H'Mes of subjects ->W and BR were much longer than those of L.C. Neither 
of t.he present subjects was capat^e of the very close shadowing evinced by 
!,.<:. (Chistovich, Fant, de Serpa-La i t5o, 1906). The fact that all responses 
initiated from a closed mouth position may have something to do with 
this; L.C. f a art i cu latory resting position was not reported. 
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Figure 1. Response latencies (averaged over 12 responses) as a function of 
stimulus number, imitation condition, vowel continuum, and subject. 
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In the deferred imitation condition, there was increased temporal uncer- 
tainty about the moment of occurrence of the "go" signal; hence, an increase 
in reaction times might have been expected relative to the delayed imitation 
condition. Such an increase was shown by BR but not (or to a much lesser ex- 
tent) by DW. The reason for this difference between subjects is not clear. 
Another difference of uncertain, origin is .DW's slower response to the [i]-[ae] 
continuum. 

].?.. Formant Frequencies 

Presentation of formant frequency data is simplified by the finding that 
two factors seemed to play only a very minor role. First, formant frequencies 
remained roughly constant or fell slightly over the course of the response 
vowels. Frequency measures at response vowel onset ana two thirds into the 
vowel followed patterns almost identical to those of the mean formant frequen- 
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cies across the whole vowel; therefore, the fatter were chosen as the primary 
dependent variable. Second, formant frequencies were virtually identical 
' across the three imitation conditions. Therefore, mean formant frequencies 
'will be presented averaged across conditions (n*36 per stimulus), 

0 % e Figure 2 plots the formant frequencies of the two subject i f responses to 

the 12 members of each vowel continuum. t The formant frequencies of the stimu- 
lus vowels are indicated by the dashed lines. These ^plots are modeled after 
Figure I.-A-4 of Chistovich et al.°[2], which showed several steps, especially 
* . in the function for F*2* # No such stteps (i.e., true plateaus) can readily be 

discerned in Figure 2, except at the [u] end of the [u]-[i] continuum for DW. 
The response formant frequencies are not a linear function of the stimulus 
formant ^frequencies, however. Changes' in^the slope of the F2 function are 
evident especially along l the [u]-[i N ] continuum, and also at the [i] end of tha 
[i]-[ae] continuum 'for DW. Note also that the [i]" stimulus vfas not well 
matched to the subjects 1 ; vocal capabilities, even though it was uased on 
Peter 3pn and Barney's (1952) male normssj For both DW and BR, F£ and especial- 
ly F3 frequencies in the response vowels x were much lower, than in [i']-like sti- 
muli. \ 

A closer examination of these stimulus-response relationships is possible 
in Figures 3 and 4, which present two-dimensional formant frequency plots. 
Figure 3 shows the data for the [u]-[i] cfontinuum in F2-F3 space; F1 varied 
very little across this geries. Each^stimulus vowel is connectedly a line tp 
its corresponding response. There is an obvious similarity between the two 
subjects 1 data in that the linear stimutffs continuum is> mapped onto a com- 
pressed, curvilinear contour in F2-F3 space. ; It is also evident^that there 
are loc&l expansion and compression effects that are quite different for the 
two subjects. DW did not distinguish among stimuli 1-3; this is the only 
clear instance of a categorical response in the present data. His responses 
to stimuli 3-6 were spaced widely apart, while stimuli 6^-9 wore mapped onto a 
compressed response space; responses to stimuli 9-12 roughly matched the 
distances between stimuli. Stimuli 6-9 could be interpreted as forming a sec- 
ond category on this continuum, although DW clearly was capable of responding 
discriminatively wifrlin that category. The stimulus-response mapping for BR 
" is quite different. Major gaps appear between responses to stimuli 2-3, 4-5, 
and especially 9-10. Thus, four clusters of responses may be distinguished, 
perhaps corresponding to categories of some kind. Clearly, however, BR was 
y-ible to respond distinctively to stimuli within each of these categories. 

It might be added that the stimulus-response functions for F1, which are 
difficult to discern in Figure 2, were quite systematic and different for the 
two subjects, -even though F1 changed over only a 30 Hz range. For DW, -F1 was 
practically constant for responses to stimuli 1-10 and then decreased rapidly. 
For BR , on the other hand, F1 was roughly constant for stimul i 1 and then 
decreased almost .continuously, although a local increase for stimulus 9 might 
L<>ad one to consider stimuli 5~9 as a second grouping. Such a grouping would 
b* 1 consistent with the patterning seen in 'the lower panel of Figure 3- 

♦ 

The daca for the [i]~[ae] continuum are shown in Figure U; they are 
t» % : i Ft -F2 space, disregarding F3« Again, the response trajectories in 
this space are similar for' the two subjects; they are very nearly linear, 
par tl lei to the stimulus continuum, and compressed, although more so f^r BR. 
Thi- linearity reflects the fact that, as in the stimuli, changes in F1 and F2 
w»*r^ correlated in the responses. The correlation (computed between the F1 
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Figure 2. Response formant frequencies (averaged over 36 responses) as a 
function of stimulus number, vowel continuum, and subject. The 
dashed lines represent'' the stimulus formant frequencies. 
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Figure 3. Average responses (circles) to stimuli from the [u]-[i] continuum 
(diamonds) in F2-F3 spac^. Corresponding stimuli and responses are 
connected by straight lines. 
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Average responses (circles) to stimuli from the [i]-[ae] continuum 
(diamonds) in F1-F2 space- • . , 
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and F2 frequency fti inferences of responses to adjacent stimuli on the continu- 
um) was 0.97 for DW and 0.8l % for BR (£ < .001). The exact stimulus-response 
mapping, however, again shows individual differences. DW has some striking 
gaps between .responses to stimuli 2-3- 1 *; responses to stimuli H-5, ,7-8, and 
10-11, oh the other hand, are very similar. Clearly , 'there are strong distor- 
tions in this, mapping, but they do not reveal a clear categorical structure.. 
The same, can be said for BR's data, although the distortions are less strong 
here, wj$h the largest gap occurring between responses to stimuli 3-M. Nei- 
ther subject's F3 values lead to different conclusions, as is evident from 
Figure 2: For DW, F3 changed rapidly in response to stimuli '1-1 and then re- 
mained completely insensitive to stimulus variations; for BR, similarly* F3 
decreased more rapidly at first and" then decreased more gradually from stimu- 
li's JJ on. TJiere are no indications of, any local categories in~ these F3 

t To summarize, these— formant frequency data offer ample evidence for 
nonlinearities in stimulus-response mapping, but .little evidence for response 
categories in the strict sense. Moreover, they offer little* 'support for 
Kent's (197?) tentative, conclusion that responses to the [iJ'-Cae] continuum 
are more categorical than those to the [u]-[i] continuum. We turn now to an 
examination of the formant frequency standard deviations. ' 

V. " 'v. . 

3.3. Standard Deviations * ■'- % • * 

• « ■ ■ " ' , ...13 

To simplify presentation, the standard deviations,- lifce'the formant fre- 
quencies, will be presented averaged across the three, imitation conditions. 
Trtese average withinrrcondition standard^ deviations do not include be- 
tween-condition variability, but this variability was small, as noted above. 
The absolute magnitude of response variability was comparable across condi- 
tions. The patterns of standard deviations, on the other hand, showed consid- 
erable differences among conditions. These differences must ^be viewed with 
caution, however, because of the similarity in mean formant frequencies across 
conditions; moreover, we have no estimate of measurement error (i.e., of the 
random variability of standard deviations). Some differences will be men- 
tioned below, ' 

The standard deviations. for all three formants are displayed in Figure 5, 
separately for the two subjects and the two stimulus continua. Some general 
observations may be made at the outset: For both subjects, the standard 
deviations 'of F1 are larger on the [ij-Cae] continuum than on the [u]-[i] con- 
tinuum, perhaps because only the former requires active control of degree of 
*-jaw opening. The standard deviations of F2, on the^other hand (ajrd, to some 
extent, those of F3), are much larger and more variable on the [u]-[i] contin- 
uum. This probably reflects the relative unfarfiliarity of the vowel sounds 
along that continuum (cf. Kent, -1973). The fact that F3 variability is often 
lower than F2 variability may also be noted. 

•.. ■ ' 

If there is a quasi-categorical structure underlying the imitation re- 
sponses, l then the standard 'deviations should increase whenever the slope of 
the formant frequency funetibn increases (i.e*. , in the "between-category" re- 
gions). The striking peak in the F2 function for DW on the [u]-[i] continuum 
indeed coincides with the rapid change in response F2 frequency between stimu- 
li 3 and- 6 (see Fig. 2). The peaks at stimuli 4 and 9 in the F2 function for 
BR, on the other hand, have no such clear correlate in the pattern of mean F2 
frequencies (Fig. 2K A finding not shown in. Figure 5 is that, for both sub- 
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Formant ^frequency standard deviations .(averaged over- imitation 
conditions) as a function of stimulus number, vowel continuum, and 
subject. 



• » 

jfccts, the F2 standard deviation peaks along the [u]-[U continuum were n,ost 
pronounced in the immediate imitation condition. The *pat terns of F1 and F3 
standard^ deviations show only occasional correspondence to the pattern of F2 
standard deviations, and they also varied considerably across conditions. 
(The large standard deviation of F3 for the [i] endpoint stimulus on— both 
stimulus continua for subject DW was entirely due to .the delayed imitation 
condition, for unknown reasons; hence the parentheses in Figure 5.) 

Along -the [i]-C«] continuum, a correlation of F1 and F2 standard devia- 
tions can t>e seen for DW (r - 0.70, £ < .01) but not for BR (r - -0.03). 
Standard deviation peaks for F1 seemed most pronounced in the immediate imita- 
tion condition. "The pattern of F1 and;F2 standard deviations finds some cor- 
respondences in s\ope changes in the formant frequency functions (Fig. 2). 
Correlations between the standard deviations and the absolute formant frequen- 
cy differences between responses to stimuli n and n+1 on the [i]-[ae] continu- 
um were significant for«subJect DW, both for F1 (r - 0.80, £ < .01) and for F2 
(r - 0.74, £ < .01). For BR, on the other hand, the standard deviations were 
not significantly correlated with the pattern of response formant frequencies. 
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3. ^ Frequency"^ distribution^ 

• * 

As might have been expected from the similarity in mean formant frequen- 
cies across imitation conditions, the formant frequency distributions for each 
continuum were highly similar across conditions also.- Therefore, histogram 
envelopes .will be\ presented for responses in all three conditions combined (n 
-132 per continuum). They are shown in Figures 6 and 7. 

The data for the CuMi] continuum appear in Figure 6. the ,F1 distribu- 
tions at the bottom are strongly unimodal and reflect the general upward shift 
of F1 in the subjects 1 responses, relative to the stimuli. The F2 distribu- 
tions in the middle panels, on the \Other J hand, cover the stimulus range' rather 
well and ,show x evidence- of trimodality for .both subjects. Trie first and third 
peaks ofjboth subjects^sire located similarly." near the endpoint3 of the contin- 

, uum and presumably represent /u/ and /i/ categories, respectively. The middle 
peak is located differently: closer , to Ci] for Mf but in the center of the 
continuum for, BR. The cle^r- trimodality of these distributions is surprising 
after" the somewhat ambiguous patterns of the formant frequency and standard 
deviation curves. It indicates definite response preferences (or avoidances?) 
on the part of 'the- subjects, \although it should be noted that the speakers 

.were able to produce any F2 frequency along the continuum, except for that. 

^corresponding to" the [i] endpoint- stimulus/ The' F2 response distributions for 
individual 'stimuli (not shown here) were also examined for bimodality. Al- 
though there were several distributions with two peaks (e.g., stimuli 'H and 5 
for DW, stimuli, 3 and 9 for BR), these peaks generally did not coinciib with 
the major peaks seen in the. overall histogram. They might indicate tendencies 
to avoid certain F2 values, or else they reflect Just random variability. The 
F3 distributions (top panels) are strongly skewed toward low frequencies, and 
considerable "undershoot" is present, as noted earlier. In addition toffche 
major peak reflecting the constancy of F3 over part of the continuum 
(of. fig. 2), two minor peaks may b4 distinguished for each subject; however, 
their locations are only partially cpasidtent with those of the F2 peaks. , • 

' The histogram envelopes* for the [i]~[ae] continuum, are plotted in Figure 
7. The F1 distributions at. the- bottom show good coverage of the stimulus 
range as .well as very pronounced peaks, four for DW and three for BR. The 
peaks correspond to different stimuli for the two subjects. The absolute fre- 
quency locations of thVJthree major. peaks, however, are remarkably similar: 
325, 475, and 550 Hz (±12 Hz) for DW; 324, 468, and 530 Hz (±7 Hz) for BR. 
There is much less, correspondence in the F2 distributions for the two subjects 
(middle panels). For DW, the F2 histogram is clearly trimodal: One mode is 
in the" [i]-[I] region and two modes are in the [£]-[ae] region. The distribu- 
tion for BR is less clear, having five peaks. Note the general asymmetry of 
these* distributions; there seemed 'to be a strong "pull" toward lower F2 fre- 
quencies in the subjects' responses, apart from the absolute downward shift in 
F2. The F2 distributions for some individual stimuli (not shown) showed signs 
of bimodality (e.g\^ stimuli 3 and 8 for DW, stimuli 3 and 9 for BR), but not 
so strongly as to suggest discrete response categories.. As on the [ujVti], 
continuum, the whole range of F2 frequencies was represented in the responses. 
The F3 distributions (top panels) we**e unimodal and were centered at the lower 
end of the stimulus range for both subjects (cf. Fig. 2). 

These data may be summarized by stating that (1) subjects show ^pronounced 
preferences for particular formant frequencies in their responses to both 
stimulus continua, and (2) the two subjects' responses are rather similar with 
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FlRure 6. Formant frequency distributions for responses to stimuli along tne^ 
CuMi] continuum. The heavy line at, the bottom of each graph Ln- 
^ dicates the range of formant values"along the stimulus cont.inuum. . 
Numbered arrows . indicate stimuli for which the mean formant fre- 
quencies of the responses fell in the- Vicinity of peaks in V*e 7 his- 
togram. Note that the distributions for the different, formanta are 
not aligned with respect to each other or across subjects; rather, 
they are saread out over a constant number of histogram bins. 

218 



■ K 



220 



v. 




Figure 7. Formant frequency distributions for responses to stimuli along the 
[ i ]-[ae] continuum. 
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regard to F1 and F3, with individual differences residing primarily in the F2 
distributions, particularly in responses to stimuli from the centers of the 
continua. 

3.5. Written •Identification 

The subjects' phonemic labeling responses to the stimuli on the [i]-[ae] 
continuum are shown in Figure 8. It is evident that^ DW and BR applied some- 
what different criteria, possibly because of their different language 
backgrounds. DW divided the continuum fairly consistently .into five categor- 
ies: /i/ (stimuli 1-4), /I/ (5), /e/ (6), /e/ (8-11), and /W (12). Stimu- 
lus 7 was ambiguous (i.e., less than 75 percent responses in any category). 
BR, on the other hand, applied only four categories consistently: /i/ (1-2),' 
/e/ (4-s), / e / (7-9), and /as/ (11-12). Stimuli 3, 6, and 10 were ambiguous 
to this listener. The /I/ category was not consistently applied by him to any 
stimulus; stimuli 2-7 received a few responses in that category. Essential- 
ly, for BR the /e/ category occupied the place of DW's /I/ and /e/ categories. 
A prominent role of /e/ might be expected in a native^speaker of 'German, which , 
has a monophthongal /e/ phoneme; however, the /I/ phoneme is also distinctive 
in German, not to speak of BR's long exposure to English. It will also be 
noted that BR's phoneme boundaries are all shifted toward the lower end of the 
continuum relative to DW's boundaries. The reason for this shift is not imme- 
diately evident. 4 




STIMULUS NUMBER 



Figure 8. Phonemic Identification responses to stimuli along the [i]-[ai] 
cont inuum. 
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The patterns of these labeling responses may be compared with the histo- 
gram peaks in Figure 7. For DW, the three major peaks in the F1 distribution 
(bottom left) may be identified with the /i/, /I/, and /e/ categories, 
respectively. There are no peaks corresponding to /e/ and /ae/. The three 
peaks in the F2 distribution for DW (center left) correspond to /i/ f 1/1/ % and 
/e/ again. The three peaks in BR's F1 distribution (bottom right) correspond 
somewhat less clearly to /i/, /e/, and /ae/, while the three major peaks in 
his F2 distribution ( center right) reflect more unambiguously /i/ , /e/ , and 
/ae/. There are no peaks for /e/ in F1 and for /ae/ in F2. '.For DW, and to 
some extent also for BR, it appears that the number of functional categories 
on thjis continuum is smaller in production than in percept ion^jeontrary "to the 
conclusions of Chistovich, Fant, de Serpa-LeitSo, and Tjernlund (1966). 

This should not be taken to imply, however, that the subjects were some- 
how less sensitive to stimulus^ differences in the imitation task than in \he 
phonemic labeling task. On the contrary, Figure 4 shows clearly that DW, for 
example, imitated quite differently the three stimuli (1-3) uniformly labeled 
as /i/. In fact, the clustering of modal responses in F1-F2 space (Fig, 4) is 
difficult to relate to the phonemic category boundaries (Fig. 8), which leaves 
the issue of the origin of the categorical tendencies in production unre- 
solved. 

r 

Phonemic labeling of vowels does not reflect the extent" of the subjects 1 
perceptual sensitivity, as is abundantly clear from many earlier categorical 
perception studies., (see Repp, 198M, for a review). This is also demonstrated 
by the absolute identification responses of the present two subjects, which 
are graphed in Figure 9. These responses are a monotonic function of stimulus 
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Figure 9, Average numerical identification responses to stimuli along both 
vowel continua. The da3hed lines have a 3lope of 1. 
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number and show good discrimination of most adjacent stimuli, especially to- 
ward the ends of the continua. Thus, for example, it is clear that DW not on- 
ly imitated differentially stimuli 1-3 on the continuum (all labeled 
/i/) but also was able to discriminate them perceptually 'without special 
training. More^ interesting, perhaps, is the observation that he also 
discriminated stimuli 1-3 on the [u]-[i] continuum, which he had imitated in 
identical fashion (cf. Fig. 3). Whether this 'implies a limitation on produc- 
tion or a perceptual limitation caused by the higher stimulus uncertainty fn 
the imitation task remains to be seen. The poorer discrimination in t\e 
centers of the continua may be a consequence of the AXB paradigm, which pro- 
vided endpoint anchors that facilitated /discrimination - of stimuli in their 
vicinity. The steps in the functions bear only a vague correspondence to the 
compression' regions in the response formant space (Figs. 3 and 1). 



« 



1. Summary and Conclusions 

The 'data, presented here are preliminary and do not permit any strong 
conclusions, especially since their pattern exhibits some of the ambiguities 
observed by Kent (1973), after whose study the present experiment was modeled. 
A few tentative observations can be made, however, which should help guide fu- 
ture research on this topic. 

(1) It appears from the present data that the pattern of vowel it. tation 
responses is essentially insensitive to the delay between stimulus and re- 
sponse (from 300 to/3000 ms)-. This agrees with the conclusions of Chistovich, 
Fant de Serpa-Leitfio, arrcHTjernlund (1966), although it -should be noted that 
the 'present subjects^clid not shadow as rapidly as subject L.C. Whatever 
internal representation of the stimulus mediates vocal imitation, it seems to 
be both (virtually) immediately available and relatively long-lasting in memo- 
ry. A trend toward more pronounced patterns of formant variability in immedi- 
ate imitation was observed. 

(?) Vooal response latencies do not seem to vary much across a vowel con- 
tinuum, as also observed by Chistovich, Fant, de Serpa-Leitao, and Tjernlund 
(1966). Thus, imitation seems to bypass conscious response selection, regard- 
Less of delay. 

(3) In contrast to Chisto'vich, Fant, de Serpa-Leitao, and Tjernlund 
\ (1966), and, to some extent, in contrast to Kent ( 1973), we found no discrete 
:>teps in the response formant frequency functions. There was ample evidence, 
however for nonlineari ies in the stimulus-response relationships. The pat- 
tern of'response variability resembled that of the formant frequencies for one 
subject only; basically, however, changes in mean formant frequencies aid 
standard deviations can be assumed to r'eflect the same underlying tendencies. 
Th.- absence of a strictly categorical structure in responses to isolated, sta- 
tionary vowels is in agreement with the categorical perception literature (see 
Rppp 1981). It is not clear whether the stimulus-response nonl mean ties 
Tiould be interpreted in terms of perceptual categories, especially since they 
wt-e difficult to relate to the phonemic labeling responses. The hypothesis 
tuH .ds to be pursued that these distortions have an independent origin in the 
production system. 



in) Response frequency histograms for individual . formants showed very 
nronourv^d peaks and valleys. Thus the subjects had definite response prefer- 
i-ru-.v., which seemed to correspond to some of their phonemic categories. Not 
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all' phoneme^ categories, were represented, however, and some categories had a' 
peak only in a single formant. The re ion of the histogram peeks'" to the 
stimulus-response nonlinear ities is not -easily characterized, although they 
are, of course, not independent. Generally, peaks correspond to regions of 
compression in the response formant space, and valleys -correspond to regions 
of expansion. The response histograms, however, seem to provide r much clear- 
er indication of underlying categories than do the formant. frequency plots. 

(5) We have found little support' for Kent's (1973) tentative claim that 
responses to the [u]-[i] continuum, which does not harbor familiar phonemic 
categories apart from the endpoints, are les«s., categorical than responses to 
the [i]-[ae] continuum, which spans five pho/emic categories in English. The 
F2 histograms show three major peaks on both continua. It is true, however, 
that on the [i]-[ae] continuum F1 histograms provide additional striking 
information about response preferences, whereas F1 along the [u]-[i] continuum 
is too restricted in range to be informative. Response variability, especial- 
ly in F2, was also much larger on the [u]-[i] continuum, presumably due to the 
unfamiliarity of the vowel sounds on this continuum. 

(6) There were considerable individual differences between 4 the two sub- 
jects, which might be related to their different language experiences. Some 
instances of congruity were also rioted. Individual differences seemed most 
pronounced in the pattern of F2 frequencies. 

(7) Chistovich, Fant, de Serpa-LeitSo, and Tjernlund (1966) hypothesized 
the existence of an intermediate stage of representation, characterized by a 
number of categories exceeding (but including) the functional categories in 
the subjects 1 language. With regard to the [u]~[i] continuum, the hypothesis 
is supported, since both subjects seemed to have an additional category be-% 
tween the two familiar endpoints. The hypothesis is not supported for the 
[i]-[ae] continuum, however, where the number of categories in production 
(i.e., peaks in the formant histograms) was smaller than that of relevant 
vowel phonemes in the language. It is possible that this continuum was not • 
sampled finely enough to reveal its full categorical structure in vocal 
reproduction. 

In conclusion, the present results leave unresolved the issue of whether 
an intermediate mental 'representation heeds to be postulated to account for 
nonlineari ties in vowel imitation responses. These nonlinear! ties exist, how- 
ever, and the 3earch for their Origin should continue. We plan to extend our 
research in several directions: by testing monolingual speakers of different 
languages to examine the role of linguistic experience; by matching the sti- 
muli more closely to the Wibjects 1 vocal capabilities (as done originally by 
Chi3tovich, Fant, de Serpa-Lei t3o, & Tjernlund, 1966) so as to eliminate dis- 
tortions that may arise in perceptual normalization; and by studying in more 
detail the possibility that the observed nonlinearities have their origin in 
articulation itself. Eventually, we also want to use more realistic, 
timc-virying speech stimuli. In general, the thrust of * our research will -be 
to disentangle the perceptual and articulatory factors that jointly constrain 
vocal imitation. This enterprise seems interesting and worthwhile, and it 
provicJp? but one example of the immense stimulus to speech research provided 
by the pioneering work of Ludmilla Chistovich and her colleagues. 
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Footnotes 



l The term "mimicking, 1 ' used by Chistovich, Fant, de Serpa-LeitSo, and 
Tjernlund (1966), suggests that the subject's response must match the stimulus 
in every respect. This was nearly true in their study, where the stimuli were 
modeled after vowels produced by the subject. In discussing our own data and 
those of Kent (1 973), we will use the more general term, "imitation," which 
allows for stimulus-response differences in various irrelevant properties 
(duration, fundamental frequency, etc.) as well as for differences in formant 
frequencies caused by mismatches between the vocal tract implied by the stimu- 
lus and that of the imitator. 



2 We yet have to conduct' a full 
cially that published in Russian. 



survey of the relevant literature, espe- 
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INFLUENCE OF FOLL6WING CONTEXT ON PERCEPTION OF THE VOICED-VOICELESS DISTINC- 
TION IN SYLLABLE-FINAL STOP CONSONANTS* 



Bruno H» Repp and David R. Williamst 



Abstract . This paper reports acoustic measurements and results from 
a series of perceptua. experiments on the voiced-voiceless distinc- 1 
tion for syllable-final stop consonants in absolute final position 
^ and in the context of a following syllable beginning with a differ- 
ent stop consonant. The focus is on temporal cues to the distinc- 
u tion, with vowel duration and silent closure duration as the primary « 
and secondary dimensions, respectively. The main results are that 
adding a second syllable to a monosyllable increases thej number of 
- voiced stop consonant responses, as does shortening of tpe closure 
duration in disyllables. Both of these effects are consistent 'with 
temporal regularities in speech production: Vowel durations are 
shorter in the first syllable of disyllables than in monosyllables, 
and, closure durations are shorter for voiced than for voiceless, 
stops in disyllabic utterances of this type. While the perceptual 5 ( 
effects thus may derive fronrtwo separate sources of tacit phonetic / 
knowledge available to listeners, the data are also consistent with / 
an interpretation in terms' of a single effect, one of temporal 
proximity of following context. / 

Introduction 

Acoustic cues to 'the perception of the phonological voiced-voiceless 
distinction in 'American English syllable-final stop consonants have been 
investigated quite intensively in recent years. One important cue is "vowel 
duration", (i.e., the duration of the periodic stimulus portion taken to corre- 
spond to the vowel— see, e.g., Raphael, 1972; Raphael, Dorman, & Liberman, 
1980), which is consistent with the commonly observed longer duration of vow- 
els preceding voiced consonants in speech production (House & Fairbanks, 1953; 
Peterson & Letoiste, 1960). Other relevant perceptuaT^Sues include the offset 
characteristics "( i.e. , lormant transitions and amplitude envelope) of the 
"vowel" (Wang, 1959; Wolf, 1978), its fundamental frequency contour 
(Gruenenf elder & Pisoni, 1980; Lehiste, 1976 ), and— if the stop consonant is 
released — the acoustic ' properties of the release (Mal£cot, 1958; Wolf, 1978), 
as well as the duration and voicing of the closure interval (Hogan & Rozsypal, 
1980; Raphael, 1981). All these cues are also relevant for stop consonants 
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'in intervocalic* position, where additional voicing, information may be con- 
tained in the vocalic portion following the release burst (Lisker, 1978). 

m 

Although vowel duration is not always the most salient voicing cue (e.g., 
Wardrlp-Fruin, 1982), it is nevertheless an acoustic dimension that has con- 
sistently been found to influence the perception of phonological stop conso- 
nant voicing in English. As a purely temporal cue, it has attracted research- 
ers' attention because it offers' an opportunity to study the sensitivity of 
phonetic perception to local and global changes in speaking rate— a topic oT 
much theoretical interest (Miller, 1981; Port, 1981). A second temper #1 cue 
to the voicing distinction— the duration of the closure interval— is available 
in fntervocalin and released final stops. Provided that closure Voicing, an 
often overriding cue (e.g., Lisker, 1981*,, is 'eliminated, closure duration 
provides important voicing information for » intervocalic stops (Lisker, 1957), 
though it is less salient in released utterance-final stops (Raphael, 1981). 
Port and Dalby (1982) have proposed that the joint perceptual influence of the 
two temporal variables, vowel duration. and (silent) closure duration, is 'best 
expressed by a constant ratio rule (however, see Massaro & Cohen, 1983a). In 
production, too, the ratio of the two interval* durations (the C/V ratio) seems 
to be fairly constant across changes in global sneaking rate (Barry, 1979;/ 
Port, 1981). The ratio varies/ however, across different utterance positions 
and as a function of other voicing cues (Barry, .1979). 

The present study investigates the perception of phonological stop conso- 
nant voicing in a context that has not been studied previously, e viz., when a 
'syllable-final stop is followed by another syllable beginning with stop conso- 
nant having a different place of articulation. Although the ^asic voicing 
cues are likely to be those already studied extensively in connection with 
stop consonants in intervocalic or absolute final portion, sequences of two 
different stop" consonants" have several peculiar properties that warrant de- 
tailed investigation. 

One consideration is that the total clo'sure period' for sequences of two 
nonhomorganic stop consonants is about twice as long as that for a single 
intervocalic stop (Repp, 1982; Westbury, 1977). When the first stop is unre- 
leased, as is frequently the case (Henderson & Repp," 1982), there is no acous- 
tic or perceptual basis for subdividing the total closure interval into por- 
tions pertaining to the two consecutive (perhaps overlapping) stop closures. 
This raises th« q jest ion of whether closure duration is a salient cue for the 
perception M voicing in this context. Certainly, if it has any effect at 
ail, one would expect to find different critical C/V ratios than for single 
intervoca ' ~ V/ ^ps. 

Whe at stop is released, the situation is similar to that for re- 

leased . absolute final position, except that the release bursts of 

stops fo. y another stop are generally much weaker (Henderson & Repp, 

1982). 1, .* lass the question of whether such weak bursts can serve as a 
perceptual er delimiting the closure interval pertaining to the syll- 

able-final stop, thereby possibly changing. the relative salience of the clo- 
sure duration cue and with it the critical ratio to vowel duration. . It has 
been shown that these weak release bursts parry considerable place of articu- 
' lation information (Repp, -1983b), so it Is not unreasonable to expect that 
they might have some influence on voicing perception as well. 
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Another prediction .that could be made is that, in sequences of two 
nonhomorganic stop consonants, the onset character istics of the second syll- 
able will have less of an effect on the perceived voicing of the preceding 
syllable-final stop 'than they have in the case of single intervocalic stops. 
For one thing, the temporal separation of pre- and post-closure cues is 
greater because of the extended closure interval, which makes perceptual 
integration more* difficult. In addition, the two stops have different places 
of articulation, which may result in a perceptual segregation of the*respec- 
' tive voicing cues, syllable-initial cues pertaining only to the syllable-ini- 
tial consonant* ^Gn the other hand, it has been observed in such VC X C 2 V stimu- 
li that the perception of 1 the place of articulation of one stof) is influenced 
by that of the other (Repp, 1983<*)i. so it may be asked whether there are simi- 
* lar (contras£tve) interactions with regard to voicing perception. Repp 
(J 983a)" linked* apparent perceptual contrast effects in place-of-art Jculation 
perception to listeners' Intrinsic knowledge of systematic variations in clo- 
sure duration in natural speech. However, apart Trom an unpublished study by 
West^bury (1977), little is known about the acoustic consequences of phonologi- 
cal voicing in nonhomqrganic stop sequences*. Therefore, the present stirdy re- 
ports acoustic as well as perceptual data. 

A final important goal of the present investigation was. to demonstrate an 
effect of following coucext on voicing fperception by comparing the absolute 
vowel durations required to change voiceless to voiced final stop percepts in 
monosyllables and in disyllables. Since it is known that, in production, the 
duration of a syllable decreases when a second syllable is added to form a 
disyllabic word (Klatt, 1973? Lehiste, 1972), it was predicted that , addition 
of a second syllable' would considerably reduce the absolute vowel duration at 
the voiced-voiceless boundary in the first syllable, while maintaining the 
perceptual relevance of the vowel duration ,cue. Indeed, an analogous effect 
on the perception of phonological vowel length was shown long ago by Nooteboom 
(1973). Given such a contextual effect, it might be asked further whether the 
effect is dependent on whether or not the listener considers the* two syllables 
as parts of the same word, and by how much the temporal separation between the 
two syllables can be increased before the voic ing boundary^? for the final stop 
of the first syllable- approaches that for the final stop in an isolated 
monosyllable (of. Nooteboom & Doodeman, 1980). 

For the present experiments, two pairs of monosyllabic English words were 
sought, such that one ended and the other began with either a voiced or a 
■ oiceless stop consonant (in terms of spelling, at least), and that madd sense 
in all four possible disyllabic combinations as'well as in isolation. Such a 
3et does not ex 1st. Rather than using nonsense materials, an approx imat ion 
was devised by using the words LAB/LAP arid GOAT/COAT, which in combination 
yield the real word LABCOAT, as well as the novel but potentially meaningful 
compounds LAPCOAT , LABGOAT, and LAPGOAT. This set was considered more at- 
tractive to listeners than complete nonsense, although the possibility of se- 
mantic bias in phonetic perception (cf. Ganong, 1980) must bQ considered. As 
will be, 3een, however, it is unlikely that such a bias influenced the results 
in any significant way. 

I. Acoustic Measurements 

Acoustic measurements were. obtained to get a general idea of the acoustic 
consequences of phonological stop consonant voicing in sequences of two 
nonhomorganic stop consonants, and particularly in the kinds of utterances 
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used also in the perceptual experiments. The only relevant previous dp'ca were 
reported by Westbury (1977) who measured three speakers ' promotions of 
isolated CVCjCaVC nonsense utteranoes, in which C t and C^vwere stoj> consonants 
differing in both voicing and place of articulation. we^ttrtiry's acoustic 
measurements showed that, in voiced-voiceless stop sequences, the vowel in the 
first syllable Was about 10 ms" longer and the closure interval about 20 ms 
shorter than ^n* voiceless-voiced sequences. Whether these differences were 
due to the voicing characteristics of the first or the second Stop, or both, 
cannot be determined from Westbury's data. Also, Cj liv these utterances was 
either consistently unreleased, or Westbury ignored the C t release burst in 
his measurements. ' % 

!■ \ ... 

In the present study, too, 'speakers produced isolated utterances. While 
these productions are r\it representative of fluent speech, and a certain 
amount of deliberate enhancement of phonetic differences may be expecte'd, the 
data, are appropriate for . comparisons wj.th perceptual responses to Similarly! 
isolated uttenances, as collected in the subsequent experiments. 

' ■».*• 

A. Method 

1. Subjects ., Four native speakers of American English, three females 
(CG, JM, AB) and one male (DW, the second author), served as talkers. CG and • 
JM grew up in New York, AB in the Midwest, and DW in California. 

♦ 

2. Utterances . Trie utterances were LAB, LAP, GOAT, COAT, LABGOAT, LAB- 
COAT, LAPGOAT, «and LAPCQAT. Ten different random orders of these eight words 
were concatenated and pointed on a sheet of paper in standard English spel- 
ling. ' 



3. Recording procedure . Each talker read from the list after practicing 
for a few minutes. The instructions were to read at a steady ■'rate, pausing 
after each word, and to speak clearly but naturally. The utterances were 
recorded in a sound-insulated booth using high-quality equipment. 

ij. Measurement procedure s. Temporal properties of the utterances were 
measured from magnified CRT waveform displays. Durations to the nearest tenth 
of a millisecond were obtained between the following acoustic landmarks '(de- 
scribed here with reference to a disyllabic utterance): (a) the onset of sig- 
nificant energy; (b) the* point of change from [1] to [fe], as determined 
visually by a noticeable change in the waveform of the glottal cycle; (c) the 
. beginning of the closure ^interval, as indicated by a significant damping or 
cessation^pf voicing pulses; (d) the onset of the C x release burst,, if pre- 
sent; (-^K/the end of the C l burst; and (f) the onset of the C a release 
burst. ^ ■ ' 

In this way, the durations of the following acoustic segments were ob- 
tained: (1) [l] resonance, (2) [ae] resonance ("vowel") 1 , and (3) total clo- 
sure interval. If C, was released, (3) could be subdivided into (3a) C, clo- 
sure, (3b) C, release burst, and (3c) C 2 closure. Monosyllabic LAB*'and LAP 
were always released by three talkers; talker CG did not release 8 out of 20 
utterances. In disyllabic context, labial release bursts were produced in all 
tokens by talkers JM and CG; there were 11/40 unreleased tokens for AB and 
5/40 for DW. ■ . 
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For each talker, means and standard deviations of these acoustic segment 
durations were calculated from the 10 repetitions of each utterance. True 
outliers were omitted; there were not more. than a few 'for each talker. Ana- 
lyses of variance were conducted separately for each dependent variable, using 
a repeated-measures design or the mean durations. * 

B. Results and Discussion 

One effect of interest is,, the shortening of the first syllable, in 'a 
disyllabic word,, as compared td its production in isolation. .Although 'such 
shortening has been desorlbed previously (e.g., Klatt, 1973; Lehlste, 197^; 
Nqoteboom, 1973), the added syllables in these studies were unstressed, while 
the present dlsyllables had a spondaic stress pattern. Nevertheless, shorten- 
ing of the first syllable was exhibited consistently by all talkers. Table 1 
compares the durations of [l],.[ae], C x closure,' and C, release burst in 
mono- and dlsyllablea, averaging over the four talkers and over voicing 
distinctions (rows labeled "mean"). It is evident that most of the shortening' 
took place during the [ae] portion^-an average reduction of 73 ms (30 per- 
cent), which was highly , significant, F(1 ,3) - 44.6, £ < .'007. By contrast, 
the Cj closure changed by only 10 ms (To percent), P(1,,3) - 16.5, £ < .03, and 
the*Cl] portion dl* not change significantly. In addition, a dramatic differ- 
ence in C, release burst durations is evident, F(1,3) - 71.5, £ < .004: WMle 
utterance-final labial release bursts contained significant amounts of aspira- 
tion or voicing, those in dlayllabl% utterances basically represented on}y the 
brief noise generated by the/ parting of the lips (of. Henderson & Repp, 1982). 

The amount of vowel shortening is comparable to that* observed for trocha- 
ic wpr'ds (Klatt, 1973). Klatt also observed about twice as much shortening 
when the first syllable ended in a voiced consonant than when it ended in a 
voiceless consonant. « Such a trend was also found in the present data (32 per- 
cent for LAB versus 26 percent for LAP), though it was much smaller, primarily 
due -to relatively less shortening of LAB than would be predicted from Klatt's 
data. s-J? ' • • . 

t . * 

The second comparison of interest is that between phonologically voiced 
and voiceless syllable-final stop consonants (LAB versus LAP in Table- 1). It 
is evident* that the [a?] portion was longer, F(1,3) - 39.4, £ < .009, and the 

C, closure duration was shorter, F(1,3) - 5.0, £ < .12, in LAB than in LAP. 
(The difference in enclosure duration was shown by all four subjects but var- 
ied considerably in magnitude; hence the low level of significance.) C x re- 
lease bursts tended to be longer^ vJhen the stop was voiceless; hewever, this 
difference was shown by only two talkers. The C a closure was also affected by' 1 
C, voicing, being shorter for' LAB than for LAP, F(1,3) - 22.5, £ < .02. The 
duration of the initial [1], on the other hand, "was completely unaffected by 
stop consonant voicing. Another difference, not shown in Table 1 , was that 
the C x closure of L|B usually contained low-amplitude * voicing- while that of 
LAP did not. The voicing usually ceased before the end of the C, closure. 

» 

The average difference in [ae] duration .between LAB and LAP was 98 ms (33 
percent) in monosyllables and 55 ms (27 percent) in dlsyllables. For C, clo- 
sure, the difference in duration was -25 ms (30 percent) in monosyllables 
and -16 ms (21 percent) irt dlsyllables. The C/V ratios in mono- and dlsyll- 
ables, respectively, were 0.2B and v 0.39 for LAB, and 0.55 and 0.65 for LAP. 
This comparison shows that the addition of a seconji syllable increased the C/V 
ratio, which thus was not invariant. 
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Table 1 

Comparison of acoustic' segment durations (ms) in monosyllables and in d lay 11- 
ables, averaged across talkers and tokens. ' \ 



Cl] 



c, c, 

closure birst 



Total 



closure closure 



Monosyllables . 

LAB " 82 

LAP--. 83 

Mean' 83 



295 
197 
246 



83 
108 
96 



117 
136 
126 



Di syllables 

LAB- 
LAP- 
Mean 



78 
77 
78 



201 
146 
V 173 



78 
94 
86 



'14 
21 
17 



84 
117 
101 



176 
232 
204 



Table 2 



Comparison of average aodustlc segment durations (ms) as a function of follow- 
ing GOAT or COAT in disyllables. 



-GOAT 
■COAT 
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ci] 

78 

? 78 



[39] 

177 
170 



C, c t 
closure burst 



87 
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C 2 Total 
closure closure 



110 
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A third cortfearison relevant, to the perceptual experiments concerns the 
effecfr of the voicing of C 8 (GOAT 'vs. COAT) on acoustic properties of the 
preceding syllable. .Table 2 lists the relevant^ data, averaged over all other 
factcrs. It is evident that there was very little effect: Only the Cae] 
vowel was slightly shorter preceding COAT than preceding GOAT — a small differ- 
ence that was almost unreasonably consistent across talkers, F(1,3) - 358.9, £ 
- .0003. In addition, the C a closure seemed to be affected by C a voicing, be-- 
ing longer for GOAT than for COAT. However, this difference, which "runs 
counter to the common finding of longer closures for voiceless than, for voiced 
stops, was exhibited by only two talkers and hence was nonsignificant. Vest- 
bury's (1977) observation that the total closure was shorter in voiced-voice- 
less' than ?.n voiceless-voiced C 4 -C a sequences is thus confirmed by the present 
data, even though closure durations exhibited large individual differences. 

To summarize: "Vowel duration" in LAB/ LAP is substantially reduced when 
a second syllable is added; it is shorter in LAP than in LAB; and it is 
slightly shorter preceding JGOAT than preceding COAT. C x closure duration 
tends to be shorter for LAB than for LAP, and it is shortened somewhat when a 
second syllable is added. C a closure may be longer preceding GOAT than 
preceding COAT. C t release bursts are substantially reduced in intersy liable 
position as compared to absolute final position. 

II. Perception Experiments 

The perceptual studies' were carried out in two stages. An initial 
fivVpart study (Exps. 1-5) was followed by a two-part replication experiment 
conducted several years later with a new set of stimuli (Exps. 6 and 7). 

A. General Method: Experiments 1-5 <•*''» 

1. Subjects . Nine paid student volunteers served as subjects. They 
were all native speakers of American English and reported having no -speech or 
hearing problems. t 

2. Stimuli . Selected utterances of one female talker (CG) were used for 
stimulus construction. A continuum from LAP to LAB was constructed from the 
first syllable of a representative token of LABGOAT. The original duration of 
that syllable (not including the closure interval) was 320 ms. A seven-member 
continuum, not including the original syllable, was constructed by deleting 
pitch pulses from the interior of the [ae] portion. The initial [1] portion, 
approximately 62 ms long; was left undisturbed. The^ynembers of the LAP/LAB 
continuum had vowel durations ranging from 118 to 23o ms irt 20-ms steps. A 
second LAP/LAB continuum, used only in Experiment 1, was constructed from the 
first syllable of a good token of LAPCOAT whose original duration was 226 ms 

'.(51 ms for.Cl]), by either deleting or duplicating pitch pulses in the [ae] 
portion. The vowel durations of the seven members of that continuum ranged 
from 134 to 255 ms in 2C-ms steps. 2 These di3yllable-derived stimuli were 
acceptable a3 monosyllables with a neutral intonation, whereas stimuli fash- 
ioned from LAB or LAP produced in isolation would have been unacceptable in 
the context of a disyllabic word because of their falling intonation. A 
strongly falling intonation contour would also have made construe t4on of a 
syllable duration continuum problematic. , -^afcu^ 



231 



Repp & WiUiama: Influence of Following Context on Perception 

In some conditions, the stimuli from the LAP/LAB continuum were followed, 
by one of two C, release bursts. These bursts and the. surrounding closure in- 
terval, were derived"* from tokens of LABCOAT and LAPGOAT respectively* Any 
closure voicing present was replaced with silence, so as to eliminate a poten- * 
tially overriding (Lisker, . 1981) nontemporal cue.. The, C t closure durations 
were 65 and 88 ms, respectively, the release burst durations r were 9 and 13 ms. 
and the total closure durations were 170 -and 2*3+ ms. v Thlse durations were 
representative of those observed in the sample of utteranoes recorded/and ana- 
lyzed (see Table 1). Note that closure duration and origin of release burst 
were confounded. ' > 

.. . ■ , .* 

t 

A good, token of GOAT, with a, voice onset time (VOT) of 22 ms and a total 
duration, of 473 ms (including the final [t] release ourst), was excerpted from 
LABGOAT. Sinoe it was desirable to use tokens of GOAT and, COAT that differed 
only in their initial VOT, the initial 66 ms of the second syllable ,of. LAB- 
COAT, representing the aperiodic portion (i.e., the VOT) of COAT, was 
substituted .for the initial 66 ms of GOAT to yield, an acceptable token of 
COAT." * 

The stimuli for 'each experiment were 'recorded on audio tape in five ran- 
domized blocks with intertrlal intervals of 2.5 s. 

- 3. Procedure . Each subject participated in two sessions. In the first 
session, the stimulus tapes, representing Experiments 1, 2, and bne additional 
test (see Footnote 3) were presented. In the seoond session, Experiments 5, 
3, and 4 were administered, always in that order. Subjects listened over 
TDH-39 earphones in a quiet room. Their task was to identify in writing the 
final stop consonant of the first syllable (»b<! or "p") and-, when a second 
syllable followed, its initial stop consonant as well ("g" or.»o")* even when 
It was constant (except for Exp. 4).' 

' B . Experiment 1_ ''" - 

J The purpose of this first test was twofold: to assess the influence of a 
final release burst on the LAP/LAB distinction in monosyllabic stimuli, and to 
compare perception of two LAP/LAB continue, one derived from an original 
utterance of LAB-, the other from LAP-. Several earlier investigators have 
noted that it is easier to change an originally voiced syllable-final stop 
consonant into a voiceless one by manipulating vowel and/or silent eld sure 
duration than vice versa (Hogan & Rozsypal, 1980; Price & Lisker, 1979), At 
the very least, a "trading relation" between vowel duration and differential 
vowel offset cues was expected in Experiment 1. As to the perceptual 
contribution of the C» closure and release burst, previous investigations 
(e.g., Hillenbrand, Ingr'isano, Smith, & Flege, 1984; Mascot, 1958; Wolf, 
1978) employing release bursts appropriate for utterance-final stops generally 
fdund only small effects. Although the dirfy liable-derived release bursts in 
the present stimuli were acoustically weaker, the relative ambiguity created 
by varying vowel duration was expected to enhance any effects of secondary 
voicing cues. 

1 Method. For the seven stimuli from each LAP /LAB* continuum, a final 
release bUFsFlwith associated C, ciosure) was either present or absent and, 
if present, derived from either LAB- or LAP-. Stimuli from a GOAT/COAT con- 
tinuum (see Footnote 3). were interspersed as fillers. 
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2. Results and discussion . The results are shown in Figure 1. It can 
be seen, first, that the average percentage of !! b !! responses increased as 
vowel duration increased. Thus, vowel duration was an effective cue to the 
voiced-voiceless distinction, as intended. Second, ff b ,f responses were much 
more frequent to the LAB-derived stimuli than to the LAP-derived stimuli, 
which reflects additional important cues presumably located at the offset of 
the periodic stimulus portion. This difference was highly significant in a 
repeated-measures analysis of variance, F(1,8) = 33*1, 2 < •° 01 - Third, the 
presence of a release burst (and of an~ associated C x closure interval) did 
make a difference, F(2 t 1 6 ) - 11.9, £ < .001. The effect was generally one of 
reducing »b w responses, even when the burst and closure derived from LAB-, at 
least in the case of the LAP-derived continuum. A separate analysis of vari- 
ance was conducted on stimuli with bursts only. Bursts derived from LAP- led 
to fewer "b" responses than did bursts derived from LAB-, F(1,8) = 8.8, £ < 
.02, and this difference was more pronounced for the LAB-derived continuum, 
F(1,8) - ^.8, £ < .05 for the interaction. 




120 160 200 240 
VOWEL DURATION (ms) 

Figure 1. Effects of vowel offset cues (LAB- vs. LAP-derived continuum) and 
Cj closure/release burst on the voiced~vo iceless distinction for 
syllable-final stops along a vowel duration continuum (Experiment 
1). 



These results confirm the relative salience of vowel duration as a voic- 
ing cue. Although, within the range of durations used here, a complete change 
of perceived category was not achieved in either continuum, vow^ i duration was 
ilout equally effective in changing LAB to LAP, and LAP to LA H . This is in 
i*«uitra. o ft to some earlier studies that have found it difficult to ^hang^ final 
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voicelesf. stops into voiced ones (Hogan & Rozsypal, 1980; Price & Lisker, 
1979). The lack of such an asymmetry in the present stimuli may be due to 
their having been produced in the context of a disyllabic word, which perhaps 
made the vowel offset cues somewhat less pronounced. Still, they were quite 
strong, being "worth" roughly 80 ms of vowel duration at the point of maximal 
ambiguity. 

The finding that release bursts reduced "b" responses regardless of the 
burst's origin may be attributed to the absence of closure voicing: Presence 
of a release burst defined a closure interval that was always silent and thus 
more appropriate for "p" than for "b". The differential effect of LAB- and 
LAP-derived bursts may have been due either to properties of the release 
bursts themselves or to C l closure duration (or both). Why this effect- was 
more pronounced with the LAB-derived continuum is not clear. 

C. Experiment 2 

The purpose of Experiment 2 was fourfold. First, the LAP/LAB stimuli 
were now presented in a disyllabic context (i.e., followed by GOAT or COAT), 
and a shortening of the absolute vowel duration necessary to cue the LAP /LAB 
distinction was expected relative to Experiment 1, by analogy to the findings 
of Nooteboom (1973; Nooteboom & Doodeman, 1980). Second, Experiment 2 
investigated the perceptual contribution of (total) closure duration in the 
disyllabic context. Third, the effect of presence versus absence of a C t re- 
lease burst was also studied in this new context. Finally, possible perceptu- 
al contrast effects due to the voicing category of the initial consonant of 
the second syllable (GOAT or COAT) were assessed. 

1. Methods . Only the LAB-derived LAP/LAB continuum was used. These 
syllables were followed by one of four closure intervals and by either GOAT or 
COAT. Two of the closure intervals were those also used in Experiment 1, 
which contained a release burst derived from either a voiced or a voiceless 
syllable-final stop. In contrast to Experiment 1, however, where only the Cj 
closure was defined (65 or 88 ms), here the C a closure was defined as well by 
the onset of the GOAT or COAT syllable. Two additional conditions resulted 
from substituting silence for the release bursts so that the closure interval 
consisted of either 170 or 2^3 ms of pure silence. 

Hesults and discussion . Analysis of subjects' responses revealed no 
influence of th* GOAT/COAT contrast on the LAP / LAB distinction, F(1,8) = 1.7. 
Therefore, Figure 2 shows the results collapsed over this factor, as a func- 
tion of total closure duration and presence versus absence of a release bur3t. 
It is evident from the figure and from the statistical analysis that the re- 
lease burst had no systematic effect, £0,8) = 1.5. However, total closure 
titration did have an influence: Fewer "b" responses were obtained with the 
lorwr closure duration, F(1,8) - 1'0.9 p < .002. Finally, contrary to expec- 
tations, the LAP/LAP boundaries were located at about„the sane point as in 
Experiment 1, reveal l .g no influence of tne addition of a second syllable. 

The absence of f hi3 expected contextual effect must b^ interpreted with 
= • ration because of th- different stimulus ensembles usee in Experiments 1 and 
.-. In Experiment 1 the inclusion of LAP-derived stimuli in the test sequence 
:r.y nave had a contras t ive effect that pushed the boundary for the LAB-derived 
.••t inul l toward snorter vowel durations. That this was the case is suggested 
:/ t. n p results of Experiments J J and 6, which directly compared LAP /LAB stimuli 
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Figure 2. Effects of total closure duration and release burst on the 
voiced-voiceless distinction for syllable-final stops in disyll- 
ables (Experiment 2). 



in isolation 
(see be low 



and in disyllabic context, and obtained a reliable difference 



The absence of an -effect of GOAT versus COAT on perception of the LAP/LAB 
contrast indirectly supports Repp's (1983a) conclusion that there are no 
perceptual contrast effects between syllable-final and syllable-initial stop 
con so Tint 3 (see also Ades, 197 iJ ; Samuel, Kat, h Tartter, 196*0. What seemed 
LiKe a contrast effect (for place of articulation) in Repp's study was ulti- 
mately attributed to perceptual information conveyed by closure duration. The 
present acoustic measurements showed that GOAT versus COAT had only a negligi- 
ble Influence on tne duration of the preceding closure, 30 the absence of any 
perceptual effect on the LAP/LAB distinction is consistent with speech produc- 
tion. Incidentally, the absence of any response preference for the real word 
LABf'.'jAr over tne disyllabic pseudowords suggests that sernant'e bianos played 
no role in the present experiment. 
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The absence of ^.ny effect du to tht* r^loaoo burst was aompwh.it. surpris- 
ing in view of the fairly large effect obtained in [experiment 1. While that 



♦ >f?>ot could have been due to either 



c Insure durat ion or prop or t i< 



r».>,^is'» bursts themselves, neither variable war, effort iv<> in th^ 1isy iihie 
oontoxt, nno possible explanation is that the following syllable >;ad 
iriK effect on the v^eak release burst, making it diffioult t.n 
(of. Henderson & Hepp, 1982 ). 
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The only significant effect obtained in Experiment 2, that of total clo- 
sure duration, is consistent with the acoustic measurements that showed total 
closure duration to be longer following a voiceless syllable-final stop. 



Experiment 3 



Experiment 3 investigated further the potential role of the release burst 
in/ disyllabic context by manipulating its position within the closure inter- 
val. 

1. Methods . Stimuli from the LAB-derived continuum were always followed 
by COAT. The single release burst used was the one originally taken from LAP- 

/, COAT. The' silent intervals surrounding this release burst were modified as 
'' • follows: The 'total closure* duration was made either short (120 ms) or long 
(200 ms), and the release burst was placed 40 or 80 ms (in the short interval) 
or J 40, 80, 120 or 1 60 ms ( fn the long interval) after the beginning of the 
closure, defining corresponding C l closure durations. 

2. Results and discussion . The effect of total closure duration was 
again obtained, F(1,8) - 12.2., £ <; .'01; the results resembled those obtained 
in Experiment 2 (see Fig. 2). Effects, of release burst position (i.e., C x 
closure duration), on the other hand, were small and apparent only in the 
longer closure interval: F(3,24) = 3.7, £ < .03, in a separate analysis. The 
effect was not monotonic: M b" responses decreased slightly as C, closure 
duration increased from 40 to 80 to 120 ms— which is in the expected direc- 
tion—but T'fteiaeased again for a C, closure of 1 60 ms. Perhaps, this very 
late-occurring release burst effectively suggested a shortening of the total 
closure interval. Alternatively, the burst may have been masked by the onset 
of the second syllable -in that condition, which restored a "long C x closure" 
to a neutral value. 



E. Experiment _4 

In this test, isolated LAP/LAB stimuli were directly compared with disyl- 
labic stimuli,' either with or without release bursts. Thus, the issue of 
whether LAP/LAB syllables from a vowel duration continuum exhibit a shorter 
category boundary in disyllabic context than in isolation was re-examined. 
Recall that' the Experiment 1 vs. Experiment 2 comparison yielded a negative 
result, but this was attributed to possible stimulus range effects. In addi- 
tion, a possible Influence of the rate of production of the second syllable on 
the perception of the LAP /LAB contrast was investigated (cf. Port & Dalby, 
1 982 ) . 

1*. Methods. The LAB-derived continuum was U3ed in conjunction with the 
I.AB-der Ived closure interval (total duration 170 ms) and release burst, which 
wan either present or absent. There were two versions of tne second syllable: 
the GOAT used previously, which had be^n produced in a disyllabic context and 
was 1473 ms in t^tal duration, and another token of GOAT from the same speaker, 
which had been produced in isolation and measured 610 ms. In this test, the 
subjects identified only the syllable-final stop. 
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2. Results and discussion . Presence versus absence of a release burst 
made no significant difference, F (1 ,8 ) = 1.5. In hindsight, this is not 
surprising in View of the fadt that the burst used happened to be the one that 
hacjf little effect in Experiment 1 (see the two left.-most functions in Fig. 1). 
Ttte average data did suggest an effect in the expected direction for isolated 
LAP /LAB syllables , but the relevant interaction was not nearly sign if icant, 
F(1 ,8) » 1.2. 

Collapsing over this factor, Figure 3 compares the labeling functions for 
LAP/LAB syllables in isolation and when followed by either version of GOAT. 
In the disyllables, there were somewhat more "b" responses when the long GOAT 
followed than when the short GOAT followed. This effect was very small but 
reached significance, F(1,8) = 7.8, £ < .03. A similar effect of the rate of 
production of the second syllable on the DIGGER-DICKER distinction was report- 
ed by Port and Dalby (1982), although in their stimuli closure duration, not 
vowel duration, was the primary voicing cue. In addition, a shift in the 
boundary for isolated syllables relative to disyllables can be seen in Figure 
3, F(1,8) « 9.5, £ < .02. This is the hypothesized effect of syllabic con- 
-text", which failed to emerge in a comparison of Experiments 1 and 2. Thus, 
the suspicion that this earlier comparison was invalid because of differences 
in stimulus ensemble tends to be confirmed by the present data. Another 
replication of this context effect wa3 sought in Experiment 6. 



Kwwr'* KffV^t of adding a second syiUihlo, and of the rate of production 

of that, syllable, on the volcod-vo iceless distinction for syllable 
final stops (Experiment 4). 
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F. Experiment 5 ' • • 

One result has emerged clearly from Experiments 2 and 3: In disyllables, 
the number of "b" responses decreases as total closure duration increases. 
Presumably, this indicates that phonological voicing information is conveyed 
by closure duration,- in agreement with the acoustic measurements.' Experiment 
4 also suggests that the LAP/LAB^)oundary for isolated syllable's is indeed at 
a longer vowel duration than that for the same syllables in disyllabic con- 
text. This difference may be attributed to a perceptual compensation for the 
expected shortening of the first syllable in disyllabic context^ of. Noote- 
boom, 1973"). Note, however, that the direction of the boundary shift when a 
second syllable is added (viz., the increase in "b" responses) is the same as 
results from a shortening of the closure duration in disyllables. Thus, it is 
conceivable that the effects of closure duration and of syllabic context are 
one and the same, reflecting temporal proximity of following context. 

The precise time course of the change in the LAP/LAB boundary in diayll- " 
ables as a function of a wide«range of closure durations may. provide relevant 
• information. Certainly, as the closure duration is increased to very long 

values, the influence of the second syllable on the LAP/LAB boundary, should, 
cease, and the boundary should equal that for isolated monosyllables. Experi- 
ment 5 sought to determine- the temporal separation (closure duration) at which 
this asymptote is reached,, as well as the shape of the function relating the., 
LAP/LAB boundary to closure duration. If there is only a single factor in- 
volved — temporal proximity of following context — this function should be mono- 
tonically increasing until the asymptote is reached. On the other hand, if 
there are two factors — closure duration acting as a voicing cue and pres- 
ence/absence of syllabic context making an independent contribution— then clo- 
sure durations typ'ical of voiceless stops (i.e., around 230 ms, cf. Table 1) 
should lead to a relative decrease in "b" responses counteracting the effect 
of syllabic context, which increases "b" responses. Thus, depending on the 
relative strengths of the two opposing effects, the function may either be 
nonmonotonic, or have an early asymptote, or exhibit a change in slope around 
the point where the cue value of closure duration changes polarity (i.e., 
around 200 ms). 

The subjects in Experiment 5 were also asked to judge on each trial 
whether they thought the two syllables formed a single compound (pseudo-) word 
or whether they sounded like two unrelated monosyllables. It was of interest • 
to determine whether the "one word"-"two words" boundary (1/2 boundary, for 
short) would coincide with the intersyllab ic temporal separation (i.e., clo- . 
sure duration) at which the LAP/LAB boundary function reached its asymptote, 
Such a finding might suggest a top-down influence on phonetic perception, or 
at 10.131 a common factor Influencing both types of judgment, 

1. Methods . The stimuli from the LAB-derived LAP /LAB continuum were 
r.iliow.'-i by tne standard GOAF at each of 8 temporal separations (closure 
intervals) ranging from 150 to 500 ms In bO-ms steps. The closure intervals 
w"?-' 1 completely silent; release bursts were not included in this test, 

in addition to identifying both stop consonants, subjects were asked to 
indicate for each two-syllable sequence whether" it sounded like a single 
disyllabic word ( LA BOO AT or LAPGOAT) or like two unrelated monosyllabic words 
(LAB, Go A T or LAP, GOAD. They indicated the latter judgment by placing a 
pomma between the two consonant responses ("b,g" or "p.g"). The results of 
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one subject were discarded because she gave no "p !! responses* (Reasons un- 
known. ) 



?. Results and discussion . The average category boundary on the LAP/LAB 
^continuum was 'determined by linear interpolation between the two data points 
straddling the 50-percent cross-over, separately for each closure duration. 
The solid line in Figure k shows this- vowel duration boundary as^ function of 
closure duration.' It is evident that the boundary shifted to longer vowel 
durations (i.e., "b" responses decreased) as closure duration Increased from 
150 to 250 ms. The overall effect of closure duration was highly significant, 
F(7 ,*J9) « 8.6, £ < .0001 . However, the boundary remained fixed at closure 
durab^ons beyond ?50 ms, F(5,35) - This is just slightly beyond the typ- 

fcal ensure duration following voiceless stops. No nonmonotonic trend is 
.ev Ldent * * 



"Two words" responses increased nonotonically as closure duration in- 
creased', as expected. The 1/2 boundary, expressed in terms of closure dura- 
tion, was calculated separately foe each vowel duration and is shown as the 
dotted line in Figure This boundary decreased strongly as vowel duration 
increased, . F(6,H2) - 30.4, £ < .0001, except at the shortest vowel durations. 
The sl&pe of this function is not far fr'om -1 (a dashed line with this slope 
is drawn in Figure 4); in other words,, the sum of vowel and closure durations 
at the 1/2 boundary tended to be constant. Thus, it appears that' the subjects 
based* their judgments not on the silent intersy liable interval but on the in- 
terval between the onsets of the first and second syllables. Note also that 
the boundary continued to decrease sit the longest vowel durations where the 
syllable-final consonant was almost uniformly labeled "b". Therefore, these 
judgments did not seem to be contingent on identification *of ' the stop conso- 
nant, although the steepest slope of the 1/2 boundary function did occur in 
the region of the LAP/LAB boundary* In addition, it may be noted that the '1/2 
boundary function intersects the LAP/LAB boundary function at 350 ms of clo- 
sure silence, i.e., 100 ms beyond the point at which closure duration losfes 
its effectiveness as a voicing cue. Thus, whatever the process responsible 
for the effect of closure duration on voicing judgments, it does not seem to 
be a direct consequence or a common determinant of perceiving the two syll- 
ables as part of a single word. i 

Those results give no reason tb consider the effect of adding a 'second 
syllable as different, in principle, from the effect of shortening the closure 
iMt.erv.il in a disyllable. Indeed, it appears that adding a second syllable 
h-trs effect only when the resulting closure interval is sufficiently short, 
He!-stive to LAP produced in isolation, shortening of the vowel in LAP was ob- 
3-*r W'j even when a e losure interval averag ing 232 ms intervened before the 
\j».-onii syllable (of. Table 1). Yet, the perceptual effect' of adding a syll- 
ab U? after that long a closure dura t ion was almost nil (cf. Fig. 4).** This 
nay .th-mm that closure duration should be viewed a3 an asymmetric cue: Short 
:c*--:ir":; are a cue for tne category "voiced", but long closures are devoid of 
any perceptual cue value. That is, a stop never sounds "more voiceless" in 
disyllabic context than in isolation. Alternatively, a tendency to hear LAP 
at closure durations characteristic of voiceless stops (200-250 ms) may have 
ho-*n cancelled by an opposing tendency to hear LAB because of temporal recall- 
Oration (i.e., a subjective stretching of the vowel) yi the presence of a sec- 
on! syllable. Although this two-process model is hot implausible, a 3in- 
k ie-proeess explanation must le preferred on grounds of parsimony. A summary 
of this argument is presented schematically in Figure 5. (However, see the 
• >n#?ril Discussion.) 9QQ 241 
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Figure 4. The voiced-voiceless boundary (solid line) and the 1/2 ^boundary 
(dotted line) as a function of closure duration and vowel duration 
in disyllables (Experiment 5). The dashed line represents a slope 
of -1, i.e., a constant sum of vowel and closure durations. 
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thematic illustration of a two-process model of the perceptual ef- 
fect of closure duration. Closure durations of less than 200 ms 
.ire assumed to cue voicedness, whereas closure durations of about 
200-300 ms are assumed to Que voicelessness (function a>. At the 
same time, a constant temporal compensation due to the presence of 
* second syllabi is issiwd to occur, as long as the closure dura- 
tion does not exceed about 250 ms (function b) . The resultant 
(function c) shows only in increase in voiced percepts because 
functions a and b cancel beyond 230 ms or so. 
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G. Summary of Experiments 1-5 

' ' In summary, these studies show: 

i 

(1) that addition of a second syllable shifts the LAP/ LAB boundary toward 
shorter vowel durations, as long as the temporal separation (closure) 
is less than about 250 ms; ■ * 

(2) that total closure. duration (in the range below 250 ms) is a cue to 
the LAP/LAB distinction, with shorter closures leading to more "b" 
responses ; 

(3) that the effect 1 of syllabic context is not a direct consequence of 
hearing the two syllables as part of a single word, and that it may 
indeed be identical with (2); 1 

• (4) that Judgments of "two words" increase almost linearly with the dura- 
tion of the first syllable and thus seem/ to rest on the perceived 
separation of syllable onsets; 
(5) and thati^ release burst and properties of the second syllable (VOT, 
overail duration) play at best a minor role in the perception of the 
LAP /LAB distinction in disyllabic context. 

Experiments 6 an^ 7 attempted to replicate findings with a new 

set of stimuli and a new group of subjects. 

H. General Methods : Experiments 6-7 , 

\ 

1. Subjects . Sixteen undergraduate students enrolled in an introductory 
psychology course at the University of Connecticut participated in the experi- 
ment for course credit. All subjects were native speakers of American "English 
with no history of hearing, impairment. 

2. Stimuli . Three representative disyllabic utterances of the male 
talker (DW), digitized at 10 kHz, served as bases for the LAB, GOAT, and COAT 
stimuli used in the present experiments. The LAB stimulus was excerpted 1 from 
a good LABCOAT. Tokens of * LABGOAT and a second LABCOAT yielded the GOAT and 
COAT stimuli. 

" 

A LAP/LAB continuum was constructed by successively deleting every other 
pitch pulse from the vocalic region of the LAB stimulus. The first deleted 
pitch pulse began 55 ms into the syllable (following the formant transitions 
for /l/) and the last, one pitch pulse prior to closure for /b/. In all, 
eight pitch pulses were excised yielding a series of nine stimuli. The long- 
est stimulus was the original LAB (220 ms); the shortest ("LAP") was 135 ms 
in- duration. The vowel durations thus rangedvfrom 80 to 1-65 ms across the 
LAP/LAB continuum. Note that these durations a*e considerably shorter than 
those employed in Experiments 1-5, reflecting diS^ences in the speaking 
rates of talkers DW and CG. 

When GOAT or COAT was appended to the LAP/LAB stimuli to form disyll- 
ables, the closure was completely silent; no release bursts of the syll- 
able-final stop consonant were included in Experiments 6 and 7. The duration 
of the COAT stimulus (487 ms) was somewhat greater than that of the GOAT stim- 
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ulus (wXns), although this difference was located mainly in the final re- 
lease burst . v 

3. . PrpfceWe . Two stimulus tapes (one for each experiment) were pre- 
pared with SiteVtrial intervals of 2.5 s and presented to subjects binaurally 
over TDH-39 heaaphones. Both tapes were heard during the 75-minute session, 
with Experiment 7\always following Experiment 6. 

j. .Experiment 6 

In this study, voNlcing judgments for' disyllables were compared with those 
for LAP/LAB monosyllables when presented in a separate test and when included 
in the disy liable fest\ We also wished to replicate the effect of closure 
duration on the voicing ^boundary, using (silent) closure durations that were 
appropriate for the present, shorter stimuli. In addition, the possible in- 
fluence of the natural GOAT and COAT on voicing judgments was re-assessed. 

1. Method . Two different stimulus sequences were presented. In the 
first, only the nine stimuli from the monosyllabic LAP/LAB series were includ- 
ed. Following five repetitions of the endpoint stimuli, subjects listened tO ( 
ten randomized blocks of tr.e nine stimuli. In thejsecond sequence, each block 
'included two occurrences of the monosyllabic stimuli interspersed among single 
occurrences of stimuli from four disyllabic contirfua. The disyllabic stimuli ^ 
were constructed by appending the . GOAT or CdAT stimulus to each member of the 
LAP/LAB series following either a 120 ms or a 170 ms silent interval. These 
intervals corresponded to speaker DW's average closure durations in, his utter- 
ances of LAB- and LAP- disyllables, respectively. There were five blocks of 
54 stimuli. Appropriate responses to the monosyllabic stimuli were "b" and ■ 
«p». For the disyllables, subjects responded with "bg», "pg", "be", or "pc" 
depending on whether LABGOAT, LAPGOAT, LABCOAT, or LAPCOAT was heard. 

2. Results and discussion . In the disyllables, there was a small but 
significant effect of the identity of the following syllable, GOAT vs. COAT, 
F(1,15) n < .002. Voiced responses were more frequent preceding COAT, 
which is consistent with three alternative explanations: (1) If closure dura- 
tions were longer preceding COAT in production, then listeners' tacit knowl- 
edge of that regularity might lead them to shift their perceptual criterion in 
favor of voiced responses in that context. (2) The effect may be due to sec- 
ond syllable duration, COAT being longer than GOAT in the present experiment. 
(3) The effect may represent response contrast between the voicing categories 
of C, and C 2 . Considering that (1) is not supported by our acoustic measure- 
ments (see Table 2) and that (3) was not obtained in Experiment 2, the most 
likely explanation seems to be (2), in accordance with the effect of sec- 
ond-syllable duration obtained in Experiment 'J. t 

Figure 6 presents the results for monosyllables (dashed lines) and for 
disyllables collapsed over the G0A*T7€«A^ factor (solid lines). As expected, 
increasing the closure duration in disyllables had the effect of shifting the 
' voicing boundary toward longer vowel durations. Significantly fewer "b» re- 
sponsps were made to disyllables with long closures than to those witl^short 
closures, F0.15) * 21.7, £ < .0003. It is also evident that subjects gave 
significantly fewer "b" responses to the monosyllables than to the disyll- 
ables p(1 15) - 73.0, p < .0001. Thus, these results replicate for the pre- 
sent Wt of stimuli the finding (Experiment 4) that the LAP/LAB boundary is 
shifted toward shorter vowel durations in disyllabic context. 
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Figure 6. r Percent voiced responses as a function of Vowel duration for 
( \ isolated monosyllables, monosyllables interspersed among disyll- 

ables, and disylla'bles with two closure durations (Experiment 6). 
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Figure 7. The voiced-voiceless boundary (solid line) and the 1/2 boundary 
(dotted line) in disyllables as a joint function of closure dura- 
tion and vowel duration ^Experiment 7). The dashed line represents 
a slope of -1, i.e., a cons.tant sum of vowel and closure durations. 
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Figure 6 .further shows that subjects gave fewer "b" responses to the 
mono sy 11a bles^t hat were interspersed among the disyllables than to those that 
were presented alone, F(1,15) - 13.2, £ - .003. This represents •^timulus 
range^effect: Because the- syllable-final stops sounded relatively mo^ voiced 
In" disyllables, they presumably sounded relatively more voiceless in the in-' 
terspersed monosyllables. The finding of. such an effect lends further 
credence to the earlier argument proferred in connection with Experiments 1 
and 2, where the apparent absence of* an effect >cT syllabic context was 
attributed to the presence-of stimulus range effects. » 

K. . Experiment 7 

This experiment replicated Experiment 5 using the new set of stimuli. • 

1. Method . The GOAT stimulus was appended to the stimuli from the 
LAP /LAB series at each of eight temporal separations (closure durations) rang- 
ing from 100 to H50 ms in 50 ms steps. These stimuli were recorded in five 
randomized blocks. Subjects were asked to decide on each trial whether they 
heard /b/ or /p/, and whether they heard one two-syllable word or two 
one-syllable words, using the responses B1 , P1 , B2 or P2. 

2. Results and discussion . As in Experiment 5, the average category* 
boundary on the LAP/LAB continuum was determined for each closure duration by 
means of linear interpolation. These boundaries are plotted as a function of 
closure duration in Figure 7 (solid line). As expected, the overall effect of 
closure duration was highly significant, F(7»105) - 50. £ < .0001 : Increas- 
ing closure duration shifted the LAP/LAB boundary toward longer vowel dura- 
tions. The effect leveled off around closure durations of 200-250 ms. But, 
in contrast to Experiment 5, there was a small increase in the bourdary even 
beyond 300 ms,'F(3, J *5) - 3.7, £ < .02. Again, the boundary functic i Is mono- 
tonic, with an asymptote 'close to the -boundary for monosyllables in Experiment 
6. 

• 

The 1/2 boundary was calculated separately for each vowel duration andjs 
plotted as a function of vowol duration in Figure 7 (dotted line). As in 
Experiment 5, it is evident that the" boundary decreases sharply as vowel dura- 
tion Increases, F(8,120) - 29.7, £ < .0001 (except perhaps at the shortest and 
longest vowel durations). Again, the slope of the function is close to -1 , 
indicating that the sum of vowel and closure duration at the 1/2 boundary 
tends to be constant. Thus, the data confirm that these judgments were based 
on the interval between the onsets of the first and second syllables. 

Because' of the shorter syllable durations, the subjects in Experiment 7 
were probably more incline to consider the two jiylLables a3 separate words 
than were the subjects in Experiment 5. 5 Thus, thS "one word"/"two words" 
boundary function intersects the voicing boundary function at 200 ms of clo- 
sure duration (versus 350 ms in Experiment 5). While this would be consistent 
with I i hypothesis that syllabic context (or closure duration) has its effect 
contingent on perception of the two syllables as part of the same utterance, 
this hypothesis was rejected . on the basis of the results of Experiment 5. 
That is, the coincidence of the 1/2 boundary with the leveling off of the syl- 
labic con>*t (or closure duration) effect may be just that, a coincidence. 
At anvu^eT the data of Experiment 7 are again consistent with a single-proc- 
ess >e^cplanat ion of the effect of closure duration: Short closures increase 
vc>fced responses. OA 4 
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L. Summary . of Experiments 6-7 s 

The results of these replication experiments affirm a;i the major conclu- 
sions of Experiments 1-5, as summarized above. There are two minor 
'discrepancies: the absence of a stable asymptote of the LAP/LAB boundary, 
function, and of a clear dissociation of voicing and 1/2 Judgments. The ear- 
lier data were clearer in these regards, but the replication must t nevertheless 
be considered useful in view of the stability of the major findings acrjoss 
different stimulus materials and subject groups. 



III. General Discussion 

, •' . 4 

The principal aim of the present series of studies was to demonstrate itwo 
effects in the perception of the voicing category of syllable-sf inal stop con- 
sonants, one being due to the addition of a second syllable beginning*' with a 
different stop, and the other reflecting the perceptual contribution of the 
closure duration cue in disyllables. An effect of closure duration was con- 
sistently obtained: The shorter the closure, the more likely subjects were to 
report a voiced stop consonant. This is in agreement with our measurements of 
closure durations in natural > speech and suggests* that listeners have 
incorporated tacit knowledge about these temporal regularities into their 
perceptual criteria for the voiced-voiceless distinction.,.^^., the-sa-ne-.tdken, 
one should expect that this knowledge includes the fact, well-known frota ear- 
lier speech production studies and substantiated by our acoustic measurements, 
that a syllable contracts when a second syllable is added to it (orJequiva- 
lently, that a -syllable is lengthened in utterance-final position). Npoteboom 
(1973) has demonstrated such perceptual compensation -in a task requiring Judg- 
ments of phonological vowel length. The present data are consistent with such 
a temporal compensation mechanism, which operates in addition to\an /indepen- 
dent perceptual effect of closure duration as a voicing cue (see FigyfS^.j 

Although the results are compatible with such\ a two-process model, they 
do not provide compelling evidence in its favor, A more parsimonious 
interpretation of t £he results, at least when considered in isolation from oth- 
er findings in speech perception research, is that there is only a single ef- 
fect, that of closure duration. That effect, moreover, is unidirectional: A 
final stop consonant sounds increasingly "voiced" as closure duration 
decreases, but even at closure durations that are optimal for voiceless stops 
in disyllables subjects do not give more voiceless responses than they give to 
monosyllables that lack any closure duration information. In- other words, the 
closure duration cue apparently contributes only to the perception of stops as 
voiced, not as voiceless. This is not unreasonable: Even though "voicing" 
may be used as an abstract cover term for a variety of acoustic manifestations 
of a phonological distinction, it may also be understood more^ narrowly as 
designating the common acoustic feature of "presence of low-frequency energy" 
within a certain time span (Stevens, Keyse'r, & Kawasaki, 1985). . Absence of 
low-frequency energy is the neutral state; that is, voicing is an acoustical- 
ly "marked" 1 feature. The problem in applying this view to the present data 
lies in the finding that it made* little difference whether the second syllable 
began with a voiced or a voiceless (aspirated) stop. If the decisive factor 
was presence of low-frequency energy within a certain interval following the 
offset of the first syllable, a sp irat ion following the closure should not have 
increased voiced responses as much as did a voiced (actually, weakly aspirat- 
ed) signal. p 
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A specific auditory mechanism that has been discussed in connection with 
speech is backward recognition masking (e.g., Massaro. 197D). Although the 
ineffectiveness of C, release bursts, in djsyllables may be due to auditory 
backward masking, the perceptual effect of decreasing olosure duration is not 
compatible' with such an explanation: Masking of either the vowel offset cues 
(which favored voiced percepts, since the stimuli' were derived from an.origi- \ 
nal utterance of LAB) or of ' the perceived duration of the first syllable' 
(leading to a reduction in subjective vowel duration*~see Maasaro & Idson, 
1978) should have decreased^ not increased voiced responses. Itf seems, thereT 
fore, that proximity of following oontext had its- effect without interfering, 
with or altering those auditory properties of the first syllable that feed in-* . 
to phonetic decisions. 

s 

, . - • 

These hypotheses surely do not exhaust the possible mechanisms that jaay 
underlie a un id imensional, unidirectional effect of closure duration. . The 
failure of two specific accounts, however, raises doubt about whether the , 
isolated parsimony of a single-process model is indeed preferable * to a 
two-process model, particularly one that ties in with, a multitude of related- 
observations suggesting that listeners make phonetic decisions ir accord with 
criteria that ' reflect the phonetic regularities of the language (Nootebodn, 
1973; Nooteboom & Doodeman, 1980; Repp, 1982, 1983c). 

A comment is in order- concerning the relationship between 'VoViel duration 
and closure duration at the voioed/voiceless boundary. For single intervocal- 
ic post-stressed stops, as' in DIGGER/DICKER (Port & Dalby, 1 932), the ra*p ,of 
closure to vowel durations (C/V) at the voicing boundary remains approxijlftea.y 
constant as one of the two temporal variables is manipulated. Port and -tfalby 
varied vowel duration while determining the boundary on a closure duration 
continuum; we manipulated closure duration while determining the boundary on 
a vowel duration continuum. In the context of two-stop sequences, closure 
duration is a much less salient voicing^ cue than it is for single intervocalic 
stops. Because Of the longer closure durations, larger ^absolute C/V^tios 
were to be expected"; the question was whether they would ■ remain constant in 
the region where closure duration influenced the boundary on the vowel dura- 
tion continuum. ' As can be seen in Figure 8, the answer is negative , N The 
average C/V ratios from both Experiments 5 and 7 increase as a nearly .linear 
function of »closure duration over the whole range. Thus, a constant-ratio 
rule 'does not hold for theses stimuli; such a rule may be restricted to the^ 
specific utterance types considered by Port and Dalby (1982). 

Another type of constancy was found in the present data, however: The 
sum of vowel duration and closure duration at the 1/2 boundary was approxi- 
mately constant. That is, subjects based theic "one TfordV'two words" judg-- 
ments on the onset-to onset interval between the two syllables and not on the 
separation (the silent closure) between them. Perception of 'the sy^able-Ti- 
nal stop as voiced or voiceless seemed to be independent of these timing judg- 
ments. This is in agreement with the recent findings of Miller, Aibel, and 
Green (1984), who showed that "perception of a particular temp orally -cued 
phonetic contrast was independent of explicit Judgments of perceived speaking, 
rate. It appears that the speech signal supports a variety of independent 
Judgments that do not interact, though they may combine at, higher, levels of 
organization (cf. Ganong, 1980; Massaro & Cohen, 1983b). 
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Figure 8. The average 1 C/V* ratio as a function of closure duration (Experi- 
ments 5 and 7). The ratios were computed from the average data 
shown in Figures k and 7, respectively. 



On the whole, then, the present results are consistent with the general 
notion that human listeners behave as if they knew all the detailed acoustic 
consequences of articulation, including context-conditioned and posi- 
tion-specific variation* The perceptual effects of various acoustic cues can 
almost always be rationalized by reference to t l ie systematic patterns that 
emerge in the acoustic analysis of speech, although, considering the many fac- 
tors that play a role in perception, a precise prediction of experimental re- 
sults from acoustic^ regularities is rarely possible. The future development 
of a more economic description of speech in terms of dynamic art iculatory pro- 
cesses may ease the burden on the perception theorist. 
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Footnotes 

1 Author? such 1 s Kl itt ( 1 V 7 i ) and r'ort (1931) took the midpoint of the 
f-*rrmnt transitions in speotrogr .ns to bo the aeoust ic 1 iqu id-vowel b jndary. 
The oso i i lograph ic cr/erion used hr.re allots most of th^ transition portion 
10 the vow?l f whl« % n j s J:*t, if i^ii «vi t\v basis of perceptual data (Raphael 

2 Th»> 'v/ 1 1 i.'ilf due ^ iop.s w»'r»- similar a<*ror.s th«» two eontinua, is intend- 
■ *v>pt f'-r \ 'v r n 1 i d i ff •■■•r^n** 0 "uj.'i'd by v h** discreteness of t.he pitch 

i. r,i*> dif ror'»fu:»- in V'jwf-i dura t i.-.ns is mainly a consequence of the 

! i ? f.*ri\M' , '» in f 1 ". d-iPit, iori. Imp decision to 'is-' vowel duration, rather than 
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syllable duration, in the presentation of results was made at a rather late 
stage, 

$ In addition, a GOAT-COAT continuum was constructed by substituting 
aspiration noise from COAT for successive pitch periods in GOAT (see the ap- 
pendix in Ganong, 1980, for a description of this procedure). This continuum 
was used in another test administered to the present subjects at the end of 
the first session. That condition investigated the influence of preceding LAB 
or LAP, closure duration, and C1 release bursts on perception of the GOAT/COAT 
distinction. There were no systematic effects of any of these variables; 
thus, VOT appeared to be the only salient cue to the GOAT/COAT distinction in 
these stimuli. 

*The assumption here is that the constant asymptotic vowel duration 
boundary of about 170 ms matches that for isolated LAP/LAB syllables. This is 
supported by a comparison with the results of Experiment 4 (cf. Fig. 3). 

'Note that the 1/2 boundary did not just bisect the range, of closure 
durations but was clearly to the left of the center of the stimulus range. 
This indicates that 1/2 judgments were not arbitrary and rested on some 
pre-established internal criteria. 
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In planning to' include the topic of this chapter in a book devpted to 
acquired mathematical disabilities, the editors must have assumed not only 
that the theoretical problem is interesting" but also that the literature 
contains enough relevant data to .discuss the issues. A full assessment of the 
differential roles played by each hemisphere in dealing with different surface 
forms of numbers would require the availability of results from a variety of- 
tasks comparing different kinds of number representation. Ideally, each task 
should also have been investigated in the conditions resulting from the 
combination between different number surface forms with left and right 
hemifield presentations, or with left and right brain injuries. However, such 
is certainly not the case, and many conclusions will* have to be drawn from 
experiments in which the requisite conditions are only partially met. The 
existing d^ta further impose two restrictions on the scope of the review: 
Only single-digit numbers (hereafter referred to as "numbers, " unless 
otherwise specified) will be considered and only nonmathematical tasks will be 
dealt with. 

Before speculating on cognitive processes and mental representations of 
numbers, we should have a good description and classification of what is 
represented in the stimulus. In spite of the restriction of this chapter to 
trie differential processing of single-digit numbers according to their surface 
form, some space will also be devoted to specifying the notational principles 
thai, underlie rnultidigit number writing. The fact that the resulting 
olassif ieatio% of symbols that will emerge is different for single-digit and 
rnultidigit numbers may 'highlight what we expect to find, and what has already 
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been found, when these symbols are considered from the vantage point of the 
cognitive neuropsychology of number processing. The first of the five 
sections comprising the chapter is therefore devoted to an analysis of 
different types of number representations. This is followed by three sections 
reviewing and discussing ...the data from (a) numerical size comparison tasks, 
(b) lateral hemifield presentations, and (c) the performances of brain damaged 
patients. ^ The fifth section summarizes the main conclusions. 

Number Representations 

The Arabic numeral 5, the Roman numeral V, the English written word five , 

and the corresponding Chinese character are different arbitrary symbols that 

denote the same abstract concept: the number five. In addition to having^ 

different . surface forms, these varipus symbols also belong to different 

notatiortal systems when they are used as components of multidigit numbers. 

Two distinctions have to be made. One concerns the difference between 

numerals and number' names and the other, the difference between logographio 

and phonographic number representations. The first distinction is better 

captured by characterizing multidigit number notational systems and the second 

is better illustrated by specifying the surface form of single digit^ symbols., 

* 

Notational Systems 

The first important distinction to bear in mind concerns the difference 
between numerals and number names. Numerals are special symbols for 
representing numbers visually. In many written .languages they coexist with 
number names, which are translations of the spoken form, according to the 
writing system of the language. The only numerals extensively used now are 
Arabic numerals. The universality of Arabic numerals contrasts with the 
language specificity of number names, but the main reason for distinguishing 
between numerals and number names lies elsewhere: Only number names allow for 
a term-by-term translation of the spoken miltidigit numbers. In, other words, 
we may write and say "two hundred and thirty three," but we do not usually say 
"two-three-three" or "hundred-hundred-ten-ten-ten-one-one-one" when we are 
confronted with 233 and CCXXXIII. Hence, the rules governing the way in which 
numbers are transcribed differ according to notational systems. 

i 

These rules are better illustrated by Chinese number-writing instead of 
by English or some other alphabetically written language, and by hieroglyphic 
Egyptian instead of Roman numerals. This allows us to capture the essence of 
the underlying notational principle without having to deal with irrelevant and 
confusing features, °>uch as the use of special words to, denote the multiple of 
10 in English number-naming or the incorporation of a subtractive principle in 
the Roman numeral system (e.g., IV instead of 1 1 1 1 ) . For the few following 
examples, let us represent the ranks of the units, tens, and hundreds by U,T, 
and H, respectively. 

In the form of hieroglyphic Egyptian used for lapidary inscriptions, one 
..ymbol denoted the unit and a different symbol denoted each of the successive 
powers of 10 up to 10,000. The number 5^3 wa3 written in the form, 
HHHHHTTTTUUU . The only important aspect of this representation is that it is 
based on an additive principle. The conventional grouping of the units of the 
name rank and the usual order of writing was irrelevant to the understanding 
»f th*- number. On a stone monument of ancient Egypt, the number would have 
b.«»»n written right-to-left (instead of left-to-right as here) and the symbols 




Holender & Peereman: Differential Heraiapneric Processing 



would probably have been displayed on more than one line, but our sequence 
would have been unequivocally understood, even if it had been written 
TTHHUHTTHUHU, The same commutative principle applies to Roman numerals, 
except that elements entering into a subtractive relation must be kept 
together in their conventional order. 

The Chinese number-naming system is also based on an underlying additive 
principle, but a supplementary multiplicative principle allows for suppression 
of the cumbersome repetitions of the symbols belonging to the same rank. This 
entails a different symbol for each unit (u1 , u2...u9). The Chinese 5^3 is 
therefore written in the form, u5Hu4Tu3, using five different symbols instead 
of the three needed in hieroglyphic Egyptian. Here too, provided the symbols 
entering into a multiplicative relation are kept together, permuting the terms 
would not transform' one number into another. In this case, however, the 
psychological impact of doing so would be stronger, because although the order 
of tne elements plays no intrinsic role in the representation, their usual 
order -corresponds to their order of utterance in the spoken number. 
Similarly, it may be unusual to write "twenty eight and four hundred" but it 
clearly means 428. The Chinese number writing principle has been called a 
"named place-value" notation by Menninger (1969), as opposed to the "abstract 
place-value" notation realized with Arabic numerals. English number-name 
writing is also a named place-value and it should be clear from what we have 
said that it is a pseudopositional system. 

The only true positional number-writing system still in use was developed 
some time in the first half of the sixth century A.D. in India, whence it 
spread more or less rapidly to the whole world. The system uses only 10 
symbols, named Arabic numerals after their first principal propagators rather 
than after their creators. In this system the rank of the urtits is abstractly 
symbolized by the position occupied by these ufiits in the written number. 
Permutations of terms arte no longer allowed without changing the value of the 
nufnber and the whole system works only because of the great intellectual 
accomplishment of symbolizing nothing by something; namely, by using zero to 
fill in the positions of the unemployed ranks (compare the English three 
thousand and twenty, in which nothing stands for the unused ranks of the 
hundreds and the units, with 3020). Aside from the Greeks' ephemeral use of a 
complete abstract place-value system including a zero, the only known 
independent invention of such a notation took place in Mesoamerica. The 
extent to which the Mayas really grasped the concept of zero is likely to 
remain controversial^ forever, but they undoubtedly used a symbol functionally 
equivalent to zero in their place-value notation of numbers (e.g., Kelley, 
1976). 

In Europe, widespread use of the Arabic numeral placte-value notation 
hpgan toward the end of the fifteenth century, rapidly supplanting the Roman 
numerals from then on. The most important consequence of this event is that 
calculation, mainly realized by means of counting boards and quite independent 
of number writing before this date, now became intimately bound to the Arabic 
numeral notational system. 

•Much of what precedes can be found in the extensive and insightful 
coverage of the topic by Menninger (1969; see P'.egg, 1983, for a condensed 
account). From an information-processing point of view, this rapid survey of 
t.\w number notational systems still in use- reveals three important points. 
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1 . The Arabic numerals stand alone in being the only symbols that enter 
into an abstract place-value notation, an inherently positional system,, and in 
being used for purpose/of calculation, a highly specialized cognitive activity 
of symbol manipulation. 

2. Number names, whether written in an alphabetic script, such as 
English, or in a logographic script (see below) such as Chinese, constitute a 
different notational system whose purpose is mainly to provide a visual 
term-by-term translation of spoken numbers. 

3. Unlike' Arabic numerals and number names, Roman numerals are more 
concrete representations of numbers, combining some properties of tally counts 
with simple additive and subtractive rules. They are quite easily decoded,, 
but they are no longer widely used, and they have never , been considered an 
efficient medium for calculation. 

Symbol Surface Forms 

Number names are represented according to the writing systems in use for 
general writing purposes. <We shall therefore distinguish between logographic 
and phonographic systems. In a logographic system the written symbols 
represent linguistic units of meaning; namely, morphemes. In phonographic 
systems, the liOiistic units represented by each symbol are phonological, 
being either syllables in syllabic systems or phonemes in alphabetic systems 
(see Gelb, 1963, for a history and description of the writing systems. For* 
discussions of the psycholinguistic aspects of the written symbols and their 
consequences for the analysis of mental processes involved in reading, see 
Gleitman & Rozin, 1977; Henderson, 1984; Liberman, Liberman, Mattingly, & 
Shankweiler, 1980; Mattingly, 1984; Rozin & Gleitman, 1977). 

In Chinese writing, the most complete logographic system ever designed 
and still in use today, each symbol represents one morpheme. Each morpheme is 
also a word, although many words are composed of more than one morpheme, and 
are therefore written with more than one character. Chinese characters are 
often called ideograms, but this terminology is misleading because few 
characters are actually designed on a truly ideographic principle. We shall 
call the characters logograms to fit the linguistic description of the unit 
they represent. 

As already mentioned, nine symbols represent the numbers one to nine in 
rn'nesP. The first three consist of one, two, and three horizontal strokes 
and the others are arbitrary symbols. Thus the first three symbols are built 
on an ideographic, or even a pictography, principle, representing the 
beginning of a stick count. Knowing that they stand for numbers, someone who 
r.mnot read Chinese at all would be able to interpret them correctly; but 
this is not the case with the symbols for the numbers four to nine. It is 
nonetheless clear that the two horizontal strokes stand for the monomorphemic 
word meaning two in Chinese and that the arbitrary symbol present ing the 
number "six" stands for the monomorphemic word meaning six in Chinese. Hence, 
tho exact nature of any of these symbols is certainly better captured by the^ 
torm logogram than by any other term. * 

In Japan, many Chin--, characters, called Kan j i characters have been " 
borrow.* to be . used cc .ointly with a syllabary. The simple syllabic 
■A nurture of Japanese allows any word of the language to be written by using 
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only the symbols of ' the syllabary. These symbols are called Kana and they 
exist -in two forms: Hiragana and Katakana. In a normal text the content 
morphemes (mainly nouns, verbs, and adjectives) are usually written in Kanji 
and the grammatical morphemes are written in Hiragana; foreign loan words are 
written exclusively in Katakana. 

Japanese number names are represented in Kanji, the t characters being 
exactfy those used in China. Like any other words they oan also be written in 
Hiragana arftl Katakana, but this seldom, if ever, occurs in daily life. This 
point should be kept in mind in interpreting the results of experiments that 
have exploited this possibility. 

The most important point of this entire discussion is that although the 
10 Arabic numerals can be considered as logographic representations of the 
numbers zero to nir#, the nine Chinese (or Japanese) logograms (zero not being 
represented) are not numerals, but number names. This fact h^s not always 
been correctly evaluated, either in the recent psychological literature, or by 
Menninger (1969 v ) who was struck by the fact that the Chinese number symbols 
realize a perfect 1 synthesis, being both numerals and number names. That this 
position is incorrect can be appreciated from the fact that throughout 
history, * Chinese number names have coexisted * with genuine autochthonous 
numerals (incorporating the Indian zero, but not tha^tither symbols, in the^ 
thirteenth century). These have now been replaced hj^rablc numerals. Hence, 
the relation between Chinese characters (or Japanese Kanji), denoting 
single-digit numbers, and Arabic numerals is exactly the same as that between 
the corresponding English alphabetically written words and these very same 
Arabic numerals. 

This is, of course, the conclusion we reached in our discussion about 
number notational systems. It is clear' that symbols do not lose their 
identity as number names or as numerals when they denote single-digit numbers. 
Nevertheless, in dealing with single-digit ' rather than multidigit numbers, 
processing operations should be more dependent, on the surface form of the 
symbols than on the notational system to which these symbols belong. 
Therefore, in investigating the processing of single-digit numbers considered 
as lexical units, it is a priori more natural to regroup the symbols with 
respect to their surface forms irrespective of the notational system. 
Accordingly, in what follows, Arabic numerals and Chinese or Japanese Kanji 
number name3 are subsumed under the logographic category, while the generic 
term phonographic is applied to number names written alphabetically or in 
Hiragana (hereafter simply referred to as Kana because the Katakar\a form has 
not yet been used). 

Roman numerals are part of a different notational system, but tfreTr 
surface form can be considered as logographic. 

Numerical Size Comparison Judgments | 

A common experimental task calls on subjects to judge which of two 
simultaneously presented Arabic numerals is the larger (less often, the 
smaller) numerically, with response latency as the dependent variable. Such 
experiments have provided a rich pattern of results revealing at least four 
different effects: symbolic distance, serial position, semantic congruity, 
and size congruity. This abundance of effects (not confined to the comparison 
of number numerical sizes, but apparent also in many other comparative 
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judgment tasks) has recently been the subject of much theorizing (see Banks, 
1977? Moyer & Dumais, 1978, for reviews). For our purposes, the main point 
of interest is the possibility of observing different configurations of 
results as a function of the surface form of the numbers. In what follows 
each effect will be briefly characterized and studies contrasting different 
types of number representations will be reviewed and discussed. 

« * 

Symbolic Distance Effect 

The latency of the comparative Judgment is an inverse function of the 
subtractive difference between the two numbers; for example, subjects are 
faster in Judging that 7 is the larger in a pair like 2-7 than in a pair like 
5-7 (Xiken & Williams, 1968; Banks, Fujii, & Kayra-Stuart, 1976; Buckley & 
Gillman, 1974; Duncan & McFarland, 1980; Moyer & Landauer, 1967; Parkman, 
1971; Sekuler & Mierkiewicz, 1977). The effect is also observed when numbers 
are symbolized by patterns of dots (Buckley & Gillman, 1974) or with numbers 
written In Kana and in Kanji (Takahashi & Green 1983). In the latter study, 
distances of 1 , 3, and 5 were compared; the general trend was the same for 
both kinds of script, but the detailed pattern of results was slightly 
different in each case. With Kana stimuli there was a relatively small 
decrease in reaction time between distances 1 and 3 and a relatively large 
decrease between distances 3 and 5, whereas with Kanji the opposite 
configuration was observed, a large decrease between distances 1 and 3 and a 
small one between distances 3 and" 5. Since only 12 out of the 36 possible 
pairs were studied, the effect could have arisen from an interaction between ■ 
the relative coding difficulty of the pairs, symbolic distance, and type of 
script, rather than from a different comparison process taking place with each 
kind of script. This is a likely possibility in view of the absence of 
interaction between symbolic die ice and type of script (Arabic numerals 
vs. alphabetic number names) in the experiment of Foltz, Poltrock, and Potts 
(1984, Experiment 2). In this case, the complete set of 36 pairs was used. 

Serial Position Effect 

In the present framework serial position refers to the position of each 
member of a pair of numbers relative to the boundaries of the ordered sequence 
of sitigle-digit numbers. For a given symbolic distance, pairs composed of 
small numbers (e.g., 1-3, 2-4) are compared more rapidly than pairs composed 
of la*ge numbers (e.g., 6-8, 7-9). The effect, often expressed as an increase 
in reaction time as a function of the increase in the smaller member of each 
pair has been consistently observed with Arabic numerals (Aiken & Williams, 
1968! Buckley & Gillman, 1974; Parkman, 1971). As for symbolic distance, 
the serial position effect was also obtained with numbers symbolized by 
patterns of dots (Buckley & Gillman, 1974), and there was no interaction 
between the serial positio\ effect and the type of script (Arabic numerals 
vs. alphabetically written \ames) in the study of Foltz et al. (1984, 
Experiment 2) . 

Semantic Congruity Effect 

This effect was identified by Banks, Clark, and Lucy (1975). It results 
from an interaction between the way the instructions are formulated with 
respect to the boundaries of the ordered set of numbers and the position of 
the pair of numbers with respect to these boundaries. With small numbers 
(e g., 2-4) subjects make their comparisons more rapidly under the instruction 
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"choose the smaller" * than under the instruction "choose' the larger." 
Conversely, with larger numbers (e.g., 6*8) decisions are reached more rapidly 
under the instruction "choose the larger" than under the instruction "choose 
the smaller." The semantic congrulty effect has] been observed twice with 
Arabic numerals (Banks et al., 1976; Duncan & McFarland, 1980). Although the 
effeot has not yet been investigated with other number representations, it is 
unlikely that the. outcome! of such a study would show differential . effects 
according to the surface form of the numbers. One reason for this is that in 
judging the size of v two pbjects or the intelligence of two' animals, the 
^semantic .congrulty effect has been found to be independent of the 
representation of the referents as pictures or as alphabetically written names 
(Banks & Flora, 1977). \ / 

At present, the picture that emerges from contrasting logographic and 
'phonographic representations of numbers in numerical comparison judgments is 
incomplete, but quite cohsiotent. As regards the symbolic distance and serial 
position effects ther/e 13 no evidence that the tas|< 'is performed 
differentially according tOj the surface form of the stimuH7 and with respect 
jbo the semantic congrulty effect the relevant information is not yet 
available. For the size congrulty effect, to be described next, the resultj. 
are more 'contradictory; this is also the case for experiments using lateral! 
hemlfield presentations in numerical size comparisons. In order to draw some 
tentative conclusions from these data a more detailed analysis JUlll be 
necessary than .has sufficed for the three effects discussed before. 

Size C ongrulty Effect 

This effect, labeled by Banks and Flora (1977), was first observed by 
Paivio (1975) in a size "comparison task involving objects represented either 
by pictures or by words. It appeared in a S troop-like situation in which an 
irrelevant dimension,.? the relative physical size of each member of a pair of 
stimuli, was combined orthogonally with the relative real sizes of the 
referents. in a congruent trial the stimulus referring to the larger object 
was also physically larger than the other. In an incongruent trial the 
stimulus referring to the larger object was physically smaller. Neutral 
trials in which both members of the pairs of stimuli were the same physical 
size were also included. Paivio observed a size congrulty effect with 
pictures, the mean response latency being 89 ms faster for congruent than for 
incongruent trials. The most striking result was that there was no congrulty 
effect at all when the same referents were represented by words instead of 
pictures. As regards number comparisons, this Stroop-like task was first used 
by Besner and Coltheart (1979) who obtained results parallel to<,thqse of 
Paivio; namely, a large size congruity effect with Arabic numerals aHd *no 
effect at all with the alphabetical representations of the numbers. 
Subsequent experiments confirmed the result with Arabic numerals, but were 
discrepant with- the initial study in showing a large size congruity effect 
with alphabetical number names as well (Besner, Davelaar, Alcott, & Parry, 
1984; Foltz et al., 1984; Peereman & Holender, 1984). The size congruity 
effect was also observed with numbers written in Kanji whereas Kana numbers 
showed ambiguous results (Takahashi & Green, 1983). ^ 

Table 1 summarizes the main results of the experiments published so far, 
except for some forthcoming data of the second author (Peereman, in 
preparation). In addition to presenting the mean reaction time for each type 
of trial (congruent, neutral', and incongruent), the table also splits the 
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congruity effect into facilitation and interference effects. The facilitation 
effect is obtained by subtracting the mean latency of congruent trials from 
the mean latency of neutral trials, while subtracting the latter. from the mean 
latency of incongruent trials yields the interference effects A few more 
procedural details are worth describing before we discuss the results. In 
some experiments the numbers were presented side by side, to left and right of 
a fixation point, and responses were made on a left and right response key 
(Besner & Coltheart, 1979, logographic condition; Foltz et al., 1981; Henik 



Table .1 , , 

Size Congruity Effect in Numerical Size Comparison Judgments 3 

Logographic -Phonographic 

i 

Experiment or C N I C N I 

Authors condition N - C I - N N - C I - N 

Henik & ■ Exp. I 588 62*1 696 

Tzelgov, 1983 b,c ' 36 72 

Besner & 531 512 586 800 - 800 

Coltheart, 1979^ .11 W 

Foltz et al., Exp. 2 561 585 641 71? 762 795 

1981 < ^ 21 56 13 33 

Peereman & Central 172 500 561 719 721 756 

Holender, 1981 field 28 61 5 32 

Left 181 523 577 717 755 777 

field 12 51 38. 22 

Right 172 528 577 701 715 759 

field . - 56 19 11 11 

Peereman, Manual 520 552 622 719 781 805 

.in preparation response 32 70 35 21 

-Vocal 568 602 691 751 768 795 

response 31 92 17 27 



Takahashi & 



752 79Q 855 1076 1051 1095 
Greenri983b,d 38 65 "22 11 

a C=congruent, N»neutral, I*incongruent. b Data estimated from a graph. 
°First session only. d Data pooled over sessions 1 and 2. 
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& Tzelgov, 1982). The other experiments used numbers displayed above and 
below a fixation point, and responses were made either on two vertically 
align'ed response keys "-{i^sner & Coltheart, 1979, alphabetic condition; 
Takahashi & Green, 1983) or by activating a forward-backward switch (peereman 
&. Holender, 1984; Peereman, in preparation,). Only one study used the 
complete set of 36 pairs generated by using the numbers 1 .to 9 (Foltz et al. , 
1981), whereas the others used only a small subset of these pairs,- from t to 
12 according to the experiment. In addition to central presentations, 
Peereman and. Holender (1984) also- included lateral ones and Peereman (in 
'preparation) contrasted the usual manual response with .a 'vocal response, the 
naming of the" larger number. 

The left side of Table 1 shows the results with logographic scripts, 
i.e., Kanji numbers in the experiment of Takahashi and Green (1983), and 
Arabic numerals in all the other cases. The main results can be summarized as 
follows. ' 

1. There ^ is ,a large overall , size congruity effect (sum of the 
facilitation and interference effect in Table 1) in each experiment. The 
magnitude of the effect tends to increase with the increase in the absolute 
level of performance. - 

2. Both the facilitation and the interference effects are substantial in 
each experiment (exoept for a very small facilitation effect in the experiment 
of Besner and Coltheart). "With central presentations, the magnitude of the 
facilitation effect is in the range of 20 to 60% of the magnitude of the 
interference effect. With lateral presentations (Peereman & Holender,- 1984), 
the ratio of the two effects is closer to 1 - 

3. The Kanji numbers used by Takahashi and Green (1983) behave in pretty 
much the same way as the Arabic numerals used in the other experiments, except 
that response latencies are much longer than with Arabic numerals, probably 
because Kanji numbers are not widely used. 

The right aide of Table 1 shows the results for phonographic scripts, the 
syllabic Kana writing in Takahashi and Green's neport and alphabetic writing 
in all the other cases. .The most prominent aspects of the results are the 
following. .43 . 

1. Overall response latencies are in the range of 200 to 250 ms longer 
than with logographic numbers. The absence of a congruity effect, reported by 
Besner and Coltheart (1979), is ..not confirmed in subsequent experiments, 
although the effect tends to 'be a bit smaller than with logographic numbers. 
There is no systematic relation between the absolute level of performance and 
tho magnitude of the size congruity effect. 

2. With central presentations, much of the size congruity effect is due 
to the interference caused by incongruent trials, congruent trials provoking 
almost no facilitation or even a detrimental effect (Takahashi & Green, 1983). 
With lateral pre3entat ions, the opposite tendency is observed; that is, 
strong facilitation effects and weak interference effects (Peereman & 
Holender, 1984). 

3. Kana numbers (Takahashi & Green, 1983) are responded to much slower 
than alphabetic numbers, but this form of representation is almost never used 
outside the laboratory. There is also a reversal in the facilitation effect. 
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Beaner and Coltheart (1979)= and the preaence of auch an effect in the two 
other experiments -(Foltz et al^ 1984r Peereman a Hplender; 1984). Foltz et ■■■^m^^m 
al. interpreted the difference between their results ^and those .of Besner and 
Coltheart . as due to their ua*l of a repeated^aet <' design inatead of-' the 
fixed-pair design of the conflicting experiment. 1'h a repeatedraet design 
each item (the number 1 to 9) ia paired equally often with' each other item, 
whereas, only a small subset of these pairs is used repeatedly in a fixed^palr Y-|J|-;.| 
deaign (12 paira repeated .20* times and 9' 'pairs .repeated -10 times in the ..\ : y.>.^:'j 
logographlc and alphabetic conditions of Beaner, and Coitheart, respectively) 
and each item is paired with only a few other items (one, two, or^three in 
Besner and Coltheart»s experiment). It is argued that there is an increasing 
probability of bypaasing the corapariaon stage as a function of the increase in / 
the number of repetitions in the fixed-pair deaign: Subjects may respond on / 
the * basis < specific response-pair associations established during! the 
experiment, mis accounts for the lack of a size congruity effect in Besner 
*»nd Coltheart' a fixed-pair deaign and the presence of such an .effect in Foltz 1;H 
et ali's repeated-set design. Moreover, the prediction was nicely supported |K; 
in a study using names of objects (Experiment 1 of Foltz et al.) where in the , '■: 
fixed-pair design of Paivio (1975, six pairs repeated eight times, each item ' 
being paired with only one other item), no size congruity effect w^s observed. 
However, with, an infinite-set design in which 48 different/pairs were 
presented only once, as if they were drawn from an infinite set of paira, 'a 
strong 115-ms size congruity effect was obtained, which reduce^ fro 49 ms after 
three further presentations of the set. Recently, Besner et .al. (1984, 
p. 127) also alluded to the observation of a size congruity effect in using a 
larger set of alphabetic numbers than in the original experiment of Besner and 
Coltheart (1979). There is, however, one result^ that is clearly at odds with 
this interpretation.' In our alphabetical condition (Peeneman & Holender, 
1984), only four different pairs were repeated 72 times, each of four numbers 
being paired with only two other numbers. This should have maximized the 
chances of bypassing the comparison stage,, thereby ^suppressing the 
size-congruity effect, but this did not happen. 

A further assumption is needed to account for the fact that the 
repeated-set design does not suppress the size congruity effect when Arabio 
numerals are used instead of alphabetic number names. Foltz et al. »<198H) 
suggested ^that, because pictures or Arabic numerals provide much shorter 
latencies than their spelled names, retrieving and comparing the size 
information could be faster than retrieving the appropriate previously learned 
response in the former than in the latter case. This is a completely -ad hoc 
interpretation. In addition, it cannot explain why, in a fixed-pair design, 
Takahashi and Green (1983) observed a very strong size congruity effect with 
KanU numbers in spite of the fact that the absolute level of performance was 
equivalent to /that of Besner and Coltheart (1979) in the alphabetic condition 
(see Table 1). In such a case, according to Foltz et al.'s interpretation, 
the retrieval of previously associated responses should have been faster than 
the size retrieval and comparison process, leading to no size congruity 
effect. 

For other tendencies revealed in Table 1, such as the smaller congruity 
effect with phonographic than with logographic script and the different ratios 
between the facilitation and the interference effect with each kind of script, 
no unequivocal conclusion c. n be drawn at present. The problem is that the 
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situation is a little too complicated. Several confounding factors whose role 
are not well understood could be responsible for these effects* Moreover, 
none of them might giv% any interesting, hint toward a possible differential 
role of the surface form of the stimuli in the operations needed to perform 
the numerical size comparison Judgment, Let us mention two such confounding 
factors. 

1. The relative salience of the irrelevant dimension Jffects the 
magnitude of its influence on the decision about the relevant dimension 
(Besner & Coltheart, 1976; Dixon & Just, 1 978) • In the present context, the 
salience of the irrelevant dimension may well be influenced by the factors 
affecting the judgment of dissimilarity between rectangles, because, roughly 
speaking, the areas occupied by Arabic numerals or by upper case number names 
are rectangular in shape. The psychophysics of dissimilarity judgments 
between rectangles varying in shape and area (Krantz & Tversky, 1975; Wender, 
1971; Wiener-Ehrlich, 1978) is surprisingly complex, no simple dimensional 
structure emerging k from the data. There are two ways in which the data 
discussed in this section could be affected by these psychophysical factors. 
First, the difference between the physical size of two Arabic numerals can 
simply be more conspicuous than that between two multilelter words, leading to 
a stronger size congruity effect in the former than ' in the latter case. 
Second, the speed with which a dissimilarity judgment can be made, or for our 
purposes, the speed with which the difference in size becomes compelling, 
should depend on the magnitude of the physical difference, at leaet within a 
certain range. This could be responsible for subtle differences between the 
magnitude of the interference and facilitation effects according to type of 
script. 

2. From our experience with the task, we know that the magnitude of the 
oongruity effect and the relative magnitudes of the facilitation and 
interference effects vary considerably between different pair/ of numbers, 
especially with the alphabetical representation. Having used only a small 
subset of pairs in our experiments, it is hard to find any systematic factor 
underlying either the intra-surface form or the inter-surface form 
variability. We nevertheless suspect that some pairs are more easily encoded 
than others, thus affecting the time at which the information oecomes 
available for performing the comparison operation. This could, of course, 
generate different patterns of results between experiments using different 
subsets of pairs 

These two confounding factors emphasize the role that the rel* tve time 
course of processing both the relevant and irrelevant aspects of the pairs of 
stimuli might play in the determination of the 3ize congruity effect, 
independent cf the comparison process itself. Of course, this could be 
systematically studied, but we then run the risk of completely losing sight of 
the real goal of this research, which is precisely to investigate whether or 
not the surface form of the stimuli affects the numerical size comparison 
operations, not to untangle the complexity of Stroop-like situations. 

Hemi fie 1 " : P resent at ions 

The rationale for using hemifield presentations of stimuli will he 
ox;-- la imM in the n» v xt main section of the paper. .Suffice it to say horu that 
a relatively better performance for stimuli displayed in one hemifield than in 
t.rii' ijt.h^r i.j Kener.iily i ntorpre UmJ in t*.-rma of a contralateral hemispheric 
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superiority for a particular class of stimuli or for a particular experimental 
task. The investigation of lateral presentations of numbers for comparison of 
their numerical magnitudes has led to a perplexing picture, because every 
possible outcome has been reported. Katz (1980, 1981) found a left visual 
field (LVF) advantage; Besner, Grimsell, and Davis (1979), a right visual 
field (RVF) advantage; and Peereman and Holender (1984), no difference 
between fields. 

The opposite field advantages of Katz (1980, 1981) and Besner et 
al. (1979) car. be explained by the difference" in the exposure durations that 
were used. A short ' exposure duration, 50 ms in Katz's experiments, could 
engender a RVF advantage that has little to do either with the specific 
material presented or the specific task performed, but is determined rather by 
the nature of the available visual information (see Sergent, 1983a, 1983b), 
According to Sergent, the right hemisphere is more efficient than the left in 
extracting the relevant information from low spatial frequencies than from 
high spatial frequencies, and vice versa for the left hemisphere. Physical 
parameters such as very short exposure duration, large stimulus size, and 
large eccentricity should favor processing on the basis of low spatial 
frequencies, therefore increasing the odds of finding a LVF advantage whatever 
the type of stimulus. On the other hand, long exposure durations, such as the 
150 ms used by Besner et al. (1979), generally lead to a RVF, which was indeed 
observed in this particular study. Notice, however, that the authors strongly 
favored an interpretation of their field advantage in terms of a left 
hemispheric superiority for performing the comparison process rather than for 
encoding the stimuli. 

Why then, using a relatively long exposure duration of 120 ms, did 
Peereman and Holender (198*0 fail to show any laterality effect? There is no 
ready interpretation for the discrepancy between their results and those of 
Besner et al. (1979). However, some tentative suggestions can be made. 

The combination of left and right presentations with responses that are 

also spatialized along the left-right dimension may generate the compatibility 

effect first reported by Simon (Craft & Simon, 1970; Simon & Rudell, 1967). 

Asking their subjects to press a right key at the sound of a high tone and a 

left key at the sound of a low tone (Simon & Rudell, 1967), or to associate 

the right key w.th a red bulb and the left key with a green bulb (Craft & 

Simon, 1970), Simon and his collaborators observed that the right side 

respons.e was made faster if the stimulus was presented in the right hemispace 

rather than in the left n -nispace, and conversely for the left side response. 

This compatibility effect ^s been described as a tendency to react toward the 

source of stimulation. It is genuinely a semantic congruity effect similar to 

that of Banks et al. (1976), discussed earlier, because the coding of 

responses in terms of left and right entails an unavoidable influence of the 

coding of stimulus location in the same terms, thereby facilitating or 

interfering with the response according to the congruency or incongruence of 

stimulus and response positions. This compatibility effect has also been 

observed with laterallzed presentations of pairs of numbers. Besner et al. 

(lO'/n found that right-index responses were shorter for displays presented in 

the RVF than in the LVF and vice versa for left-index responses. The same was 

true for the relation between the rightmost or leftmost finger and the visual 

field wfvn two fingers of the same hand were used to make the response (Katz, 

i-irtl). However, with bimanual responses Katz (1980, Experiment 1) failed to 
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find the effect, a surprising outcome in view of the usual robustness of the 
phenomenon. i ^. 

Figure 1 illustrates what happens when the side of presentation affects 
each response in an, opposite direction through one factor—compatibility— and \j 
in the same direction through another factor — the presumed hemispheric 
superiority. In Besner et al.'s experiment, the RVF advantage, which is very 4 
strong for the right response, gives way to a small (nonsignificant) LVF 
advantage for the left response. Similarly, in Katz's (1981) experiment, the 
LVF advantage, which is very strong for the left response, is considerably 
reduced for the right response. The explanation of this phenomenon goes as 
follows/ taking Besner et al.*s results as the basis for the reasoning. For 
the right response, compatibility and the assumed left hemispheric superiority 
add their effects to enhance performance with RVF presentations and to impede * 
performance with LVF presentations, thereby inducing a large field difference. 
For the left response, the advantage of the stimulus being presented in the 
LVF due to compatibility is counteracted by the disadvantage of being first / 
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Figure 1. Mean reaction Lime for judging which is the larger of ^two numbers 
as a function of visual field (LVF«left visual field, RVF*nght 
visual field) and response side (LR=left response, RR=right 
response) in the experiments of Besner et al. (1979) and Katz 
(1981). The data were estimated from graphs.' 
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* iu- i- :>.'f't hemispheric superiority is counterbalanced by the cost of being on 

* wr<,ng 'H-1-' in terms of compatibility, thereby reducing, and even reversing 
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The existence of a compatibility effect that interacts in a complex 
manner with a laterality effect, presumably linked to hemispheric superiority, 
is an obstacle to the study of this phenomenon per se. it would therefore be 
better to get rid of the compatibility effect, by suppressing the left-right 
polarity of the response, presenting the pairs of numbers vertically and 
replacing the left-right responses with forward-backward responses, exactly 
the procedure used by Peereman and Holender (198*1). If the left hemisphere is 
really better than the right in performing the comparison process, as Besner 
et al. (1979) believed," ~ it should be so whatever the spatial disposition of 
the numbers in the pair and the ensuing RVF should show up uncontaminated by 
the compatibility effect. However, as we have already pointed out, . the LVF 
advantage disappeared altogether, when we followed this procedure. Thus, the 
factor that combined with compatibility to determine; the pattern of results 
found by Besner et al. (see Figure 1) was not a left hemisphere superiority 
for the task, but something else. 

What else? We do not know, but we suggest looking to other fornjs of 
compatibility that almost certainly play a role when left-right polarized 
displays and responses are involved in the comparison of the numerical 
magnitude of numbers. For instance, in deciding which number is the larger, 
the response is faster if the larger number is on the r.ight side Of the pair 
(e.g., 3-7) than with the opposite configuration (7-3). This effect was as 
large as 30 ms in the experiment of Aiken and Williams (1968), using 18 pairs 
among the 36 possible, and 20 ms in Experiment 2 of Banks et al. (1976), using- 
21 pairs. However, the effect was null in Banks et al.'s Experiment 1 
involving only six pairs, suggesting possible interactions with specific 
characteristics of the pairs. 

A last point should be stressed. Peereman and Holender's (1984) 
experiment is the only one fulfilling the requirements of this chapter for 
numerical size comparisons; that is to say, it is the only study that 
combines factorially the type of script (Arabic numerals and their French 
alphabetic names) with the side of presentation. It is clear from Table 1 
that there is no field advantage whatever the type of script, a conclusion 
that can probably be safely accepted. Whether there is evidence for a 
differential influence of the type of script on the comparison process cannot 
be answered on the basis of these data because, as remarked in the preceding 
subsection concerning the size congruity effect, several possible confounding 
factors must be controlled before any reliable conclusion can be reached. 



Lateral Hemifield Presentations 



Rationale Underlying the App roach 

From the standpoint of understanding how numbers represented 
logographically or phonograph ica lly are processed, the study of laterality 
should be considered as one of the tools for analysis of processing operations 
into components. However, the extent to which the method succeeds in doing so 
depends on a number of difficult, unsettled issues. 

In vision, provided gaze fixation is controlled, it is a matter of 
anitomical fact that a stimulus displayed laterally in the LVF or in the RVF 
is first channeled to the contralateral hemisphere and that its access to the 
homolateral hemisphere depends on its transit throug!- the interherru spheric 
.M.mmisnures. The most common interpretation of a better performance in one 
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heraifield than in the other is in terms of a greater ability of the 
contralateral hemisphere to perform the task. In other .words, a given 
hemifield advantage* is almost automatically translated into a contralateral 
•hemispheric superiority. Two points are worth stressing. First, everi if a 
task is ifully lateralized (i.e., can be accomplished by only one hemisphere), 
this need not entail better performance in terms of response latency or 
accuracy for contralateral^ displayed stimuli (see G. Cohen, 1982 f for an 
excellent discussion of this point). Second, besides hemispheric superiority, 
a number of factors can determine a hemifield advantage. This point has been 
repeatedly stressed by Bryden (1978,. 1982; see also Bertelso/i, 1982, for -a 
similar case). Our verfture at interpreting th^ contradiction between the 
results of Besner et al. (1979) and Peereman and Holender (1984) as. resulting 
from a combination of different compatibility effects is a good example of 
such an alternative approach. 

Be that as it may, the logic underlying the study of lateralized 
presentations of different types of script implies that the rssults of "an 
experiment should show some kind of interaction between hemifield and type of 
script. Three different interactive patterns could emerge; (a) opposite 
visual field advantages for each type of script, (b) no field advantage for 
one type of script and a field advantage for the other, (c) different degrees 
of field advantages in the same direction for both scripts. The fi^st pattern 
is called a nonordinal interaction * because each level of one factor (RVF 
vs. LVF) has an opposite effect on each level of the other factor (type of 
script). The third pattern is an ordinal interaction because the laterality 
effect has the same direction for each level of the other factor. The second 
pattern, in which there la no significant field advantage for one type of 
script, is a special case of either the first or the third pattern. Among 
these three possible interactive patterns, the first is certainly the most 
appealing because it takes the form of a double dissociation between the field 
advantages and the two kinds of stimuli , and because a nonordinal interaction 
cannot be removed by a nonlinear transformation of the dependent measure. The 
existence of such an interaction is therefore relatively independent of the 
choice of the dependent measure. 

Claims for opposite field advantages in processing phon graphic and 
logographic scripts arose from the initial observation of a R 1,p advar^ge in 
the identification of Kana words (Hatta, 1978) or nonwords (hi . . Shimizu, 4 
Hori, 1978; Sasanuma, Itoh, Mori, & Kobayashx, 1977), and of a uVF advantage 
in the identification of Kanji words (Hatta, 1977a, 1 977b t 1978). SL.ce then, 
the RVF advantage for processing Kana words has been clearly ^cnfinr.ed. 
Moreover, Kanji words composed of more than one characte* are r.lso better 
processed in the RVF. For single Chinese or Kanji logograms the result3 are 
more contradictory because all possible outcomes — RVF, LVF, or no field 
advantages — have been reported. In spite of this, there is still a widespread 
terdency to consider that the bulk of the evidence favors the hypothesis of a 
right hemispheric superiority for the pro^ssing of single characters (see 
f ;oltheart f 1903, for a recent example). Fi . our reading of that literature, 
w*: believe that too many confounding factors could have flawed most of these 
r ( ;3u!tr> for the existing data to be conclusive. If a conclusion i3 
nonetheless to be drawn, we would argue that a right hemisphere superiority 
for logographic processing is extremely unlikely (see Peereman & Holender, 
19^S; Ht/i^mier & Peereman, in preparation). 
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Review .of the Data 

i 

The best way to characterize the investigation of lateral differences in 
the processing of numbers is to say that the data are scarce, the procedures 
diverse, and "the results quite consistent. - Most experiments have been 
concerned exclusively with Arabic numerals,; although two have ■ included 
alphabetically written numbers. Let us distinguish between those studies 
using response latency and those relying on response accuracy as the dependent 
variable, reviewing the latter first. 



J* 

9 



Hines, Satz, Schell, and Schmidlin (1969, Experiment 3) inaugurated a 
series of experiments in which three pairs erf numbers were successively 
presented: One member of each pair was displayed at fixation point, the other t 
either 3° to the left or 3° to the right of fixation. In any particular 
trial, the lateral member of each pair was always on the same side. (A fourth 
centrally placed number was temporally interpolated between the third pair and 
recall in two subsequent studies [Hines & Satz, 1971 , 197*1].) The task of the 
subjects was first to recall al*» the central numbers, and then to recall the ^ 
lateral ones, only trials with 100* correct central identification being taken 
into account. The results always showed an overall better recall for right 
than for left numbers. Further "examination showed the RVF advantage to be 
confined to' the first? two pairs of a trial, the last pair showing no field 
advantage. These data are generally disqualified on the ground that the 
central task in itself can generate a RVF independent of the nature of the 
..stimuli (see Bryden, 1982, for a discussion of this long-standing debate). 
These data also involve a mixture between perceptual and memory processes 
without allowing us to disentangle their respective contributions to the RVF 
advantage, if any, ■*» 

However, if both members of each pair of stimuli are laterally displayed 
one* in the LVF simultaneously with the other in the RVF (rather than one 
lateraJlv, the other centrally), a weak LVF advantage may be observed. This 
effect was not significant in Experiments J and 2 of Hines et al. (1969), but 
reached significance in Experiment H of Hirata and Osaka (1967K, This LVF 
advantage could result from the strategy of report., rather than from nature of 
the stimuli, the left member of each pair being generally reported before the. 
right one. 

(.armon, Nachshon, and Starinsky ( 1 976 )' reported a higher percentage of 
recall for -two- or four-digit numbers (represented by Arabic numerals) in the 
RVF than 'in the LVF with fifth- and seventh-grade children. First- and 
third-grade children were tested only with two-digit numbers, and showed no 
field advantage. Hatta and Dimond (1980) also reported better' RVF recognition 
of six-digit numbers with adult Japanese and English subjects. However, this 
RVF advantage might be caused by the combinatorial process involved in forming 
mwlUdiglt numbers rather than by the logographic nature of the 
rep res«»ntat ion. 



Y»t Besner, Daniels, and Slade (1982, Experiment 1) obtained a very 
large RVF advantage with single-digit Arabic numerals, right presentations 
l-ading to 80? correct responses and left presentations to only W% . In their 
second experiment, they tested Japanese and Chinese subjects with both Kanji 
.umbers and Arabic numerals. This time the 1 H* RVF advantage for Arabic 
numerals was less pronounced than in Experiment 1. Overall performance with 
Kanji numbers was much lower than with Arabic numerals, but a ,6% RVr 
.i-U.nt.iK... was again observed. It is a pity the authors limited their material 
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to the numbers *4 to 9. Remembej/ythat Kanji, the first three numbers are 
concrete .representations of th? quantity •. .they denote, being oomposed of one, 
two, ,or three horizontal strokes # v ^£#r$£3 the other numbers are arbitrary 
symbolic representations, like 'numeral s, , It would have been 




interesting to compare the la ter a 1 i tyfceff&o t in the two cases. 

We should point out that the etfteat to which the huge laterality effect 
obtained fh these experiments was eau'sed by physical characteristics of the 
displays is not known. Given the importance of visual parameters in 
determining visual field advantages (Sergent 1983a, 1983b), this point is 
worth stressing. Most studies resort to stimuli physically smaller than the 
Arabic numerals, subtending 5.9° x 8.5°, ^.6° x 5.7°, and 2.0° x 3.3°i in 
Besner's et al.'s Experiment 1 and than the 10.6° x 10.6° stimuli of their. 
Experiment 2. these stimuli were centered 8.8° to the left or right of 
fixation. The exposure duration was individually adjusted to yield an overall 
performance of 50 to 6056 correct responses, mean durations being 32", HH t and 
56 ms for small, medium, and large stimuli in Experiment 1, and 54 ms in 
Experiment 2. Finally, a 50-ms . patterned mask immediately followed stimulus 
presentation, which is also unusual* 

We now turn to studies in which response latency was the dependent 
variable. Naming latencies for Arabic numerals showed no field advantage in 
the experiment of Gordon -and Carmon (1976) and a small, but significant, 10-ms 
RVF advantage, in Experiment 3 of Geffen, Bradshaw, and Wallace (1971). 
Procedural differences between experiments inspire no special comments. The 
main parameters of the task were, for Gordon and Carmon (1976) and Geffen et 
al. (1971), respectively: 7 and 4 different stimuli, exposure durations of 
100 and 160 ms, stimulus* visual angles of 2° and 0.5°, and eccentricities of 
3° and 4°. 

With two-choice manual response tasks involving only two Arabic numerals, 
a significant 13-ms RVF advantage was found by Geffen et al. (1971, Experiment 
5) and a similar but not significant 14-ms advantage was reported by G. Cohen 
(1975) in her cued condition* Cohen mixed three different representations of 
the numbers 4 and 5? Arabic numerals, their English * names presented 
vertically, and the corresponding patterns, o,f dots found on a die* Subjects 
were either cued or not cued about the specific representation to be used on 
each trial. Under precuing, number names yielded a slightly greater R\^F 
advantage (20 ms) than Arabic numerals and dots showed a nonsignificant 12-ms 
LVf- advantage. Without cuing, there was no field difference, whatever the 
type of stimuli , 

Classification tasks also yield a small RVF advantage with Arabic 
numerals. Geffen, Bradshaw, and Nettleton (1973) used a many-to-one stimulus 
response mapping in a^go-no go task involving four numbers and one vocal 
r^3pon^e. Two Arabic numerals called for tine recponse "bong" and two others 
r*?«|uired no response. This yielded a 16-ms RVF advantage. In a 
numhor-nonnumber classification modeled on the classical lexical decision 
tank, Peereman and Holender (1985) showed a significant 1 3~ms RVF advantage 
and a significant 26-ms advantage in the same direction for alphabetically 
written number names, the interaction between visual field and type of script 
being nonsignificant. 

Tasks involving more complex decisions than those just described have 
boon almost exclusively concerned with numerical size comparison judgments 
(H*wr ot al., 1979;. Katz, 1980, 1981; Peereman & Holender, 1984); they 
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were reviewed and discussed in the preceding main section. There is only one 
more study to mention.' Hatta (1983, Experiment. 1) orthogonally varied the 
numerical, size and the physical, size of each member of laterally displayed 
pairs of Arabic numerals, asking his Japanese -subjects to perform a congruity 
judgment. An overall 29*ms RVF advantage ensued. A 47-ms RVF advantage also 
showed up when the same task was performed with the Kanji logograms denoting 
million units (Experiment 3K By contrast, in judging the congruity between 
the relative physical size of pairs of logograms and the relative physical 
sizes of the referents (Experiment 2), the subjects showed a 26-ms LVF 
advantage. The • author interpreted his results as evidence that the 
comparative judgment is based on different types of mental representation in 
dealing with Kanji object names and with numbers, but x that in this "latter 
case, the surface form of the stimuli (Arabic numerals vs. Kanji words) is 
Immaterial. ... . k. V* 

Interpreting the Results «.•*•* 

There is nothing ■ to indicate that opposite visual field advantages for 
each kind of script, the first possible pattern of results mentioned above, 
will ever be found in contrasting numbers written logographically and 
jgphonographically. On the contrary, both surface forms lead to RVF advantages. 
If the LVF sometimes reported with single Chinese logograms is valid, then 
Arabic- numerals belong to a small class of logograms behaving differently as 
regards laterality, as is also suggested by the results of Hatta (1983). 

- To date, there has been no report of a significant ordinal interaction 
between visual field and type of script, but the prospect of finding one is 
quite good. With Arabic numerals and simple tasks like naming or 
categorizing, a RVF of '10 to 15 ms is typically found; 'this is the lower 
bound for the effect to be statistically significant. On the other nana, 
Peereman and Holender (1985) pointed out that the magnitude of their RVF 
advantage (26 ms) for numbers written alphabetically was more substantial and 
well within the range, of the large RVF advantages typically reported in 
lexical decisions involving larger classes of words. • Hence, there would be 
nothing very ..unexpected if a statistically more powerful study in the future 
came up' with a significant ordinal • interaction indicating a larger RVF 
advantage for alphabetic number names than for Arabic numerals, both RVF 
advantages being significant. 

Let us assume that the ordinal interaction has indeed been found. What, 
and how much information would then have been gained regarding logographic and 
alphabetic number processing? To answer this question, we will be' obliged to 
integrate laterality research into the broader framework of mainstream 
information processing analysis-a highly desirable, but so far unfulfilled^ 
' accomplishment (Allen, 1983; Berfelson 198?). Bertelson optimistically 
closed his recent analysis of laterality research with the words "Progress can 
be expected, provided laterality research is conducted as an integral part of 
the study of human cognition" (1982, p. 203). Taking a few steps in this 
direction in search of an answer to the question asked at the outset of this 
paragraph, we came up witr a more distressing conclusion (Peereman & Holender, 
1985). An analysis si-milar to that leading to this ^ conclusion will now be 
presented. 
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: The aim of % the analysis is to show how the ordinal and nonordinal 

interactions described in the rationale for the approach can be interpreted in 

^K^ t . . v the . relatively cons tra ined ' framework of ,a r 4l^^ana^ysis,^o.f . ^e%ctipn^Xim^^^^^ 
The two basic assumptions are as follows: v \~ 

';**;'■;, ' • ^ . ' . \ • , .■■ '.:Hi-b" "* 

' It Response latency can be decomposed into a. series of additive ' 
component durations corresponding to different stages of processing. ?or J 
additivity to hold, the processing stages should be" strictly serial, each • 

f. stage starting only when the preceding stagfe has provided an output. Under : 

these constraints, any modification of the duration of one particular stage . ■•bl- 
under the influence of any factor (i.e., hemifield of presentation) should be " -I'fi 
reflected in the response latencies. This is the one of the assumptions 
underlying Sternberg's (1969, 1984) additive factor method, one of the most 

f • popular methods of analysing processing into components.- . • J- 

■ 2. Hemispheric specialization is relative rather' than absolute: Each 

hemisphere can perform the task, but one is more efficient than the other. K; 
This i3 more reasonable than the alternative assumption, of ab&oli^e . '"H; 

r . hemispheric specialization (only one hemisphere can perform the task-) , which 

V would imply that the difference in latency between visual fields is due to the . >§ ; 

time needed to transfer Information from one hemisphere to the other when; the .if 
stimulus is displayed on the wrong side. This alternative is unlikely because 

r it would entail a relative constancy across experiments in the magnitudes of 

the difference between response latencies in the two fields, which is e hardly ■ ^| 

the c. oe (G. Cohen, 1982). k • ( -a- 

^ Within this framework, the simplest possible account for the presence of ;| 

jxn interaction, either ordinal or nonordinal, between visual field and type of i 
script requires the addition of. two specific assumptions. (1) All processing -r 
stages are neutral with respect to laterality save one, or at maximum two—let *; 
us call them Stage3 A and B~which can be either neutral or lateralized ~ 
according to circumstances. In the neutral state of a stage the operations :i 
performed during that period take the same mean amount of time in each 
hemisphere. If a stage is lateralized, the corresponding operations are 
performed faster in one hemisphere than in the other one. (2) We cannot 
exclude a priori the possibility that (a) both Stage A and Stage B are 
lateralized on the same side for both types of stimulus, or (b) that each 
stage is neutral for one type of stimulus and lateralized for the other, or 
(c) that the two stages are lateralized in opposite directions for each type 
of stimulus. Within these constraints, each pattern of interaction can be" 
realized in three extreme ways according to* the following principles. In each 
of the three cases, the ordinal interaction is labelled 1 and the nonordinal, 
2: 

A. Only Stage A is lateralized, Stage B is neutral, 

1. Stage A is left-lateral ized for both kinds of .lumber representations, 
/ but the magnitude of the RVF advantage depends on the surface form of the 

stimuli , being larger for alphabetical number names than for Arabic 
numerals. 

2. Stage A is lef t-lateralized for alphabetical number names and 
right-lateral ized for Arabic numerals. 
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B. Stage A is "left-lateralized for both scripts and produces the same degree 
of RVF for both scripts. a 

v. 

1 Stage B is neutral .for Arabic numerals and left-lateralized for 
alphabetical number names; hence, Stage B adds its RVF, advantage to that 
of Stage A, producing the required interaction. 

2 Stage B is neutral for alphabetical number names and 
right-lateralized for Arabic numerals. Stage B produces a IMF advantage 
sufficient to supersede the RVF caused by Stage A. 

C. Stage A is neutral for alphabetical number names and Stage B is neutral 
for Arabic numerals. 

1. Both stages are left-lateralized when dealing with their specific 
stimulus type. It just happens that the RVF advantage due to. Stage B is 
larger than that produced by Stage A. 

2. Stage A is right-lateralized for Arabic numerals and Stage B is 
left-lateralized for alphabetical number names. 

Within the constraints defined above, one can easily find the different 
possible interpretations corresponding to the third interactive pattern, in 
which only one type of script shows a visual field advantage. Similarly, the 
different possibilities corresponding to an absence of interaction can also be 
worked out. 

In performing a similar analysis (Peereman & Holender, 1985), we showed 
that a significant ordinal interaction is no more informative than a 
nonsignificant interaction. We now extend this conclusion by showing that the 
favorite nonordinal interaction is. no more informative than the ordinal. 
Using the simplest- possible model fo^ the organization of processing 
operations, and looking at the interaction between visual field and stimulus 
y P e, 'we always come up with three different possible interpretations. In 
other words, laterality as a tool for analysing processing into components 
simply fails to do its Job. One can, of course, retort that nobody ever 
pretended to disentangle these various alternatives by using the laterality 
approach. The point would be well taken, but then what is the purpose of 
printing all these beautiful phonograms and logograms in the left or in the 
right visual field? To avoid such criticisms researchers using the laterality 
methodology should be 'much more explicit about their goals than they usually 
are. * 

Number Processing After Brain Injury 

The discussion of the data provided by brain damaged people is divided 
into two parts. In the first, all patients have lesions affecting different 
language areas of the left hemisphere. These patients display a variety of 
aphis Ic troubles, including alexia with agraphia. Potentially, the 
investigation of these patients can teach us something about the way d -ent 
M<Vitional processing systems can break down, but the respective role. K iayed 
nyVarh hemisphere in determining the presc -ved aspects of performance cannot 
.v-rtained. In the second part of the discussion, data concern the 
■vtiai or t-tul disconnection of the right hemisphere from the left. These 
'lit a ran potentially tell us something about the competence of the right 
n^nr-.ph.-r.. in dealing with different representations of single-digit numbers. 
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Number Processing with Lesions Located in the Language Areas 

What is clear from^bhe fragmentary Information available is that the 
ability to read single and multidigit numbers represented by Arabic numerals 
can be somewhat preserved in patients unable to read letters and words in the 
alphabetic code* From the anatomo-clinical study of 183 retrorolandic brain 
injured patients, H^caen, Angelergues, and Hduillier (1961) concluded that the 
frequency with wh^ch letter or digit reading breaks down is different 
according to the site of lesion; this could indicate that partially different 
functional subsystems are indeed involved in each case. f>ie same authors also 
mentioned 16 patients who, as a group, showed a relatively stronger inability 
to identify mathematical signs than Arabic numerals, A fully selective loss 
of this competence has jbeen reported in two patients by Ferro and Botelho 
(1980). Unfortunately, they did not investigate the patients* ability to 
identify the written names of . the mathematical signs. 

We have found -one patient-group study in which the ability to process 
different number surface forms was investigated (Dahmen, Hartje, Bussing, & 
Sturm, 1982). These authors selected three groups of 20 patients, each group 
corresponding to a different pathology— Wernicke 1 s aphasia, Broca's aphasia, 
and right -sided retrorolandic lesion. These groups varied in* their 
identification performance for numbers (chosen in the set 1 to 25), but showed 
no difference according to the type of representation (Arabic numeraj^^r 
their German names). The mean numbers of correct identifications (out oP^ptj 
for Arabic numerals and number names were: 13.3 and 12.0, 16.2 and 14. &, and 
19.7 and 18.0, for Wernicke's aphasics, Broca's aphasics, and patients with a 
right-sid^* lesion, respectively. The same was true in a numerical size 
comparison^J^srsk in which the patients had to point to the larger number in 
pairs of numbers. For Arabic numeral 9 and number names the mean number of 
correct responses were, 8.7 and 8.6, 14.2 and 14.3, and 16.2 and 15.9, for 
Wernicke's aphasics, Broca's aphasics, and patients with a right-side lesion, 
respectively. Three faints should be stressed. First, the difference in 
performance between Broca's and Wernicke's aphasics is in the direction 
expected on the basis of the overall differences exhibited by these patients 
in terms of language comprehension. Second, the results of trfk comparison 
task confirm the trend we observed with normal subjects in beingnmaffected by 
the surface form of the numbers. Third, unlike the identification process, 
the comparison process seems to require the integrity of the right hemisphere 
as indicated by the difference in performance in each task in the group of 
patients suffering from a right-side lesion. 

* 

Most of the data reviewed so far are based on the comparison of the mean 
performances of groups of patients. Caramazza (1984) has recently pointed out 
that such an approach is ill-suited for addressing the issue of the analysis 
of cognitive processes because the patients included in a given group could 
differ greatly in terms of the mechanisms underlying their performance. The 
remaining data come either from single cases or from very small, relatively 
homogeneous groups of patients, w«hose individual symptomatology is generally 
available. 

The few single case studies to mention in closing this subsection all 
concern Japanese patients who offer the additional interest of being able to 
show a dissociation between the processing of two forms of logographic script 
( Kanj i words and Arabic numerals). One such aphasic patient was described by 
:;.i3anuma and Monoi (1975). He was severely impaired in language 
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comprehension, whether spoken or written. His most prominent symptom was his 
•• greater ability to read aloud Kana than Kanji words, though with litUo 

comprehension in either case. This dissociation Is extremely rare in Japanese 
? aphasics, most of whom show a differential ability to process each kind of ,*® 

/ script, being better with Kanji than with Kana , words (Sasanuma, 1974a, 1975). 

Aside from that, the patient was able to carry out arithmetical operations, 

and he could read and understand Arabic numerals. 

The other Japanese patients are alexias w^h agraphia. Strictly "| 
speaking, the syndrome consists of a selective impairment 1/n reading and ^ 
writing, unaccompanied by any trouble in spoken language comprehension and • . B 

expression. . However, this ideal definition almost always overstates the true 
state of spoken language performance. It would be closer to reality to say 
that the most prominent syndrome coexists with mild aphasic troubles (e.g., '4; 
, He'caen & Kremin, 1976). In these cases, reading impairment is always stronger 

in Kana than in Kanji. Yamadori (1975) reported^ one such patient who was • ^ j 
severely impaired in calculation and in number 'reading. Yattadori #3130 
"summarized two other reports published in Japanese (Kotani, 1935; Ohashi, 
'1965, cited by Yamadori, 1975) concerning two other cases of alexia with \% 
agraphia aocompanied by strong calculation impairments. ' * 

Sasanuma (1974b) described a case of alexia with transient agraphia. The 
patient's reading in both Kana and Kanji was strongly deficient, performance 
• in Kanji being a little better than in Kana. "Reading of digits, both Arabic 
and Chinese was impaired also" (Sasanuma, 1974b, p. 93), but less than for ' / 
. Kana and- Kanji. The patient was good at mental calculation, but written 
! calculation was hampered by his reading problem. Six months later almost all 
■ symptoms other than alexia had disappeared, but nothing, more specific was 
stated. 

Because of his preservation of mental calculation, the patient of 
Sasanuma (1974b) is sometimes considered as a counter-example to the 
.observations of Yamadori (1975). Such cannot be the case because it is 
extremely unlikely that the two patients suffered from the same pathology. 
Yamadori's patient was alexic with agraphia, whereas the patient of Sasanuma 
presented all the symptoms of an. alexia without agraphia (see next subsection) 
in which preservation of mental • calculation is typical (e.g., Geschwind, 1965; 
Symonds, 1953). v 

To sum up, most patients with lesions affecting the language areas of the 
left hemisphere show various degrees of disintegration of their mathematical 
abilities and a poor ab'Uty to read Arabic numerals. >^ 

Number Processing by_ the Disconnected Right Hemisphere. / » 

For theoretical reasons that will become apparent as we proceed, it is 
convenient to examine successively the data from patients, having one of the 
following characteristics: (a) alexia without agraphia, (b) section of the 
splenium (posterior part) of the corpus callosum,- (c) commissurotomy, and (d) 
hemispherectomy. 

Alexia without Agraphia . An ideal "patient with alexia without agraphia /\ 
■ cannot read, but can write spontaneously and to dictation, without being able 
to reread what he or she has written. Such a patient has no trouble in spoken 
language expression or comprehension, but has some difficulties in visual 
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object naming and a strong impairment in colpr naming; mental calculation is >S 
•;V ■ preserved. The classical account of the syndrome by the pioneer neurologists, 

as revived and specified by Geschwind (1965), describes isolation of the 
/ intact left angular gyrus from each occipital visual cortex. This condition 

i" X" ' la caused by (a) destruction of the left visual cortex or of the connections 
between the left visual cortex and the left angular gyrus and (b) destruction 
of the splenium of the corpus .callosum, which cuts off from the left 
hemisphere the visual information reaching the right intact hemisphere. The 
essence of the trouble is, therefore, the disconnection of intact language 
zones from the visual world (but not from the auditory or somatosensory 
world). Logically, the lesions should entail an incapacity to name any/visual 
. scene; this Is not the , case, although it is partially realized by some 

, difficulty in object naming and a very poor ability to name colors. The vN 

supplementary hypothesis needed to account for the preserved ability to name 
objects is that objects can be recognized (but not named) in the right 
hemisphere and that it is this interpreted information, not the visual 
information, that is transmitted to the left hemispheric language areas 
through the intact anterior portion of the corpus callosum. We assume that 
' the right hemisphere is unable to provide 'verbal responses (see below). By 

' extension, any reading, performance preserved (e.g.., for Arabic numerals) 
should reflect right hemisphere competence in processing the information. The 
same rationale is used by Coltheart (1980, 1983) in his attempt to account for 
deep dyslexics* preserved reading competence (the etiology of this syndrome is 
different from that of alexia without agraphia). , r . • 

The recent literature has usually described four of the six alexic 

patients of H^caen and Kremin (1976) as displaying the symptomatology required 

to fit the ideal model. They all performed better in dealing with Arabic 

* numerals than with single letters or single words. Close examination of, the 

constellation of symptoms displayed by these patients reveals that their 

classification is problematic: Part of their deficit could well be due to 

some lesion in the language area of^the left hemisphere as well. Lack of 

space precludes any full analysis off this very complex question' here. Only a 

brief account sufficient to make thexpoint will be "presented, but it should be 

kept in mind that including of a patient in one of the subgroups of Table 2 is 

often tentative .because we generally r^ck the, decisive anatomo-clinical data 

to remove the uncertainty. For example, the inclusion of Stengel, Vienna, and 

Edin's 0 9^8) two patients in the group consisting of close to ideal cases 

would be disputed by Oxbury, Oxbury, and Humphrey (1969). 

% 

Table 2 includes many of the cases of alexia without agraphia reported in 
the literature published in English between 19^8 and 1976. All the tabulated 
cases are bad at reading words, and most of them ; are relatively better at 
reading Arabic numerals than letters. They can be further differentiated on 
the basis of several features, among which four have been selected for the 
present discussion. These features are (a) presence or absence of a right 
hemianopsia, (b) color naming performance, (c) spelling performance, and (d) 
mental calculation performance. Spelling is evaluated either by the- ability 
of a patient to spell and to recognize orally spelled words or by his use of a 
spelling strategy in attempting to read word. A good mental calculation 
performance indicates that very simple arithmetic operations can be performed. 
Here follows the description of the groups. 
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Group 1 : These patients are close to the ideal model in showing an 
anatomically verified (Gumming, Hurwitz, & Perl, 1970; Geschwind & Fusillo, 
1966) or presumed brain infarction (Benson t Brown, & Tomlinson, 1971, Cases 1 
to 3; Holmes, '195G; Kreindler & Ion5§escu f 19M; Oxbury et al., 1969, Case 
.1; Sa&anuma 1974b; Stengel et al.., 19^). The infarction is of the left 
occipital lobe (responsible for the hemianopsia) and of the splenium, both 
caused by Some pathology of the left posterior cerebral artery. The 11 
patients show a very consistent "pattern o*' results. They all exhibit the 
right hemianopsia expected from, their lesion :n the left visual cortex. When 
investigated, spelling and mental calculation are always good and color namii.g 
i3 always bad except in one case (Cumming et a]., 1970), Even the last case 
does not cast doubt on the homogeneity of this group of patients because the 
dorsal part of the spleniumwas preserved in this patient; this could allow 
for* a transfer of the visual color information from the right occipital lobe 
to the left angular gyrus (see Greenblafct, 1 973 , for further discussion of 
this~point). It is clear that the case of Sasanuma (1974b) reviewed *n Uie 
preceding subsection fits perfectly well in this group of patients. couli 
be tentatively concluded that the right hemisphere of these patients has a 
much better ability to identify Arabic numerals than letters or 
phonpgraphically written words.. 1 

e 

Group 2: The four cases included in this group are remarkable for their 
, lack of right hemianopsia, indicating an intact left visual cortex. This can 
bo related to the etiology of their trouble, which is different from that of 
patients 'ih Group 1. The cause of the alexia was either a surgical remoyal of 
a vaecular anomaly (Ajax, 1967, Case 1), a carbon monoxide intoxication 
(Goldstein, Joynt, & Ooldblatt, 1971), a head trauma (Heilman, Safran, & 
Geschwind^ 1971, Case 1), or a tumor (Greenblutt, 1973). In the- last- case 
anatomical analysis of. the brain showed that the tumor had destroyed the 
splenium r T the corpus catlosum and part, but not all, of the connections 
becween the intact left visual cortex and the left angular gyrus. One can 
-tentatively hypothesize that the connections needed to transmit color 
information to the left angular gyrus were also preserved in two of the other 
patients. If this were indeed the case, these four patients, showing good 
apeiJing and good mental calculation, couLi be considered examples of the 
ideal nodel of alexia without agraphia as good as those of^Group 1. However, 
it seems unlikely, in view of the etiologies of these alexias, that the brain 
damage was really so selectively lo<?fflized. We cannot preclude • the 
possibility that the language areas have been more or less affected as well, 
rendering those ca^es potentially -less conclusive than those of Group 1 with 
respect to ■ the assessment of right: hemisphere competence in number 
i dent if i oat ion. 

Group^3» The patients in this group are certainly the least appropriate 
for our* purposes because their left occipital lesion extended to tt ^ parietal 
lobe .13 well and because we generally do not know whether the spKnium was 
:n^d or not (Ajax, 196*4, Ca3e 1; D. N. Cohen, Salanga, Hully, Ste inb^rg, 
\ HtMy, 1 Q7^ ; Kinsbtfurne *c Warrington, 1964; ^Warrington & ZangwiLl, 1 9;>7 ) • 
patient of Caplan and Hedloy -Why te (1974) had an anatomically verified 
N-nion of th*» left .occipital cortex and of the* splenium, but she suffered from. 
Mdition-il cimali left parietal lesiorio. This can explain way this patient had 
!/ 7M 1 t:\ A i i 1 1 finger agnosia, and left-right confusion. The heterogeneity of 
;---rf-;rm.i' in th.-> group contrasts wit.h the homos; # *n»- 1 ■■ y of pprf irmanv'e of 
i. r, --?-ip 1 patients, which could support the idea either of a pathology different 
fr.i;; lh ; :. alexia without agraphia, or , at lr.uit, cf the .xistonee of 
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supplementary' problems compared wi'th the ideal model. Therefore, these data 
cannot safely be used to infer anything about right, hemisphere competence. 

Group 1: This group includes the six alexics studied by Hdcaen and 
Kremin (1976). It is immediately apparent that all six patients would fall in 
our third group, even though three of them are generally considered as ideal 
cases of alexia without agraphia (CRO, DEL, SAL) • CLI, who is slightly 
agraphic, is generally assimilated to the nongraphic patients, whereas BLA 
and MAG are dissociated from them on the basis of their strong agraphia. It 
is clear that none of these patients presents the profile of those of Group 1 
in having good spelling and, rental calculation and poor color naming. Within 
the group of four patients /with a right hemianopsia, only SAL is attested as 
suffering from a lesion of vascular origin sufficiently selective to affect 
only the occipital lobe. Although his performances depart from these of 
patients in group 1 , he is the only one ..whose inclusion in this group might be 
defended. / 

In surranary, we can probably uafely rely on the patients of Group 1 in 
attempting to assess the right-hemisphere ability to process visual symbols 
semantically. Sc^e other . .ses are probably valid as well, but we have enough, 
•patients in Group 1 to adopt a conservative position, excluding all others 
from further discussion. 

Section of "the splenium of the corpus callosum . The logic of the 
interpretation of alexia without agraphia implies that, under LVF 
tachistoscopic presentations, a* patient whoue only lesion is a section of the 
splenium of the corpus callosum should exhibit exactly the ?ame reading 
performance as an alexic without agraphia. To our knowledge, only six such 
cases have been reported, three of thgjff being examined at a time at which the 
hemifield presentation technique was not well developed. All six cases had 
their splenium severed in the process of removing a subcgrtical small tumor. 
The patWffrt of Trescher and F'-rd (1937) could not recognize letters presented 
in the LVF (for what duration?), but other symptoms, such as a left hand 
astereognosis, did not guarantee that the splenium section was the only damage 
suffered by the patient. The two patients studied by Maspes 0 9^8) did not 
present any hand as tereognosis (for wooden letters). Letters and Arabic 
numerals presented for 1 or 0.5 sec were very well recognized in the RVF, but 
not in the LVF, -as expected. Three similar Japanese cases have been reported 
in the recent literature (Sugishita, Iwata, Toyokura, Yosh oka, & Yamada 
1978), With RVF brief presentations (f»6 ms), orax reading and comprehension 
of both Kana and Kanj i words were almost perfect. With LVF presentations, 
performance was poorer-? being at chance level for Kana, but somewhat better 
than chance for Kanj i . Moreover, performance improved relatively more w: \h 
Kanji than with Kana when the sar* material was rete3ted two or three times at 
interval.* of several months. Unfortunately 9 numbers were not tested. 

.'Jugishita et al. ( 1 973 ) interpreter* their results as showing that the 
right hemisphere can understand th^ logographic Kanji. better than the 
phonographic Kana; this is consonant witn the better recognition of Arabic 
numerals than of letters cr alphabetically written words by alexics without 
agraphia. It is unfortunate that Mas; en OW) did not systematically 
invest igat.* 1 ttv* difference In performance for letters end Arabic numerals. 
ougUhita et a,. al:>o assumed that tne vocal response was given by the left 
hemi :ipher>?, not by the ritfht one. This implies that, however accurately 
id'-nt i f 1 Mv 1 LVF stimuli w r»», naming could not have been achieved at all if 
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the corpus callosum were completely sectioned. This brings us to the next 
stage in our review. 

Commissurotomy , The Initial investigation of patients having undergone a 
complete section of the interhemispheric commissures revealed the very poor 
language competence of the right hemisphere in most cases (see Gazzaniga, 
1970). However, two patients of the California series (L.B. and N.G.) and 
three patients of the East Coast series (P.S., J.W., and V»P.) show a 
considerable right 'hemisphere* language .comprehension (both spoken and 
written). In addition, within two years following commissurotomy, P.S. and 
V.P. have " developed the ability to access speech from the right hemisphere 
(see Gazzaniga, 1983)." All these patients have a complete section of the 
corpus callosum and the hippocampal commissure. Most patients of the 
California series, including L.B. and N.G. , also have^ a section of the 
anterior commissure, whereas most patients of the East Coast series, including 
P.S., J.W. , and V.P., do not. 

V 

Recognition performance for letters and Arabic numerals was investigated 
in six patients of the ■ California series by Teng and Sperry (1973). In one 
condition, pairs of letters or of Arabic numerals were presented either in the 
LVF or in the RVF, calling for a verbal report. With RVF presentat:ons 86? of 
the letters and H0% of the digits were named correctly, whereas with LVF 
presentations these scored dropped to 13$ and 35%, respectively. Notice that 
one patient, N.W., made 100$ errorr with LVF presentations of both letters and 
numerals and that L.B. reported on^; 22% of letters, 'but 80$ of Arabic 
numerals from the LVF. In anotr r experiment involving fewer numerals 
(Gazzaniga & Hillyard, 1971), L.B. described a strategy of enumerating the 
numbers and stopping when the response popped out, which tffe authors found 
compatible with the idea that the response was actually generated in the left 
hemisphere through cross-cuing with the right hemisphere. m Thi3 strategy 
should be easier to use with single-digit numbers' than with letters because 
the set is smaller in the former than in the latter case. Whether L.B. or the 
other often tested patients U3ed in Teng and Sperry's 0973) experiment were 
using a similar strategy is not known, but it cannot be ruled out. Her^e, 
these data are not strong enough either to challenge^ the hypothesis of the 
muteness of the right hemisphere, or to provide unequivocal evidence of a 
greater intrinsic ability of this hemisphere to deal with Arabic numerals than 
with letters. 

Gazzaniga and Smylie ( 1 9^4 ) tested two of the right-hemisphere 
language-proficient patients of thy? East Coast series, V.P. and J.*'. Both 
patients showed errorless performance in multiple choice pointing to numbers 
presented to the right hemisphere (LVF), V.P, was also able to read these 
numbers aloud perfectly well, whereas J.W. was completely unable to do so. 
Both patients showed extremely poor performance in carrying out simple 
ar i t.hroet ie operations with the ^itftot hemisphere.. 

The 'lata of v/zani,,a and omylie (19ti*0.are compatible witfrxthe idea that 
tne left hemisphere normally subserves calculation (see preceding subsection) 
an ! tnat the ritfht hemisphere cart- identify numbers. However, rcultidigit 
number identification is probably better- in these two patients than in most 
paUent.s showing the ideal symptomatology of alexia without agraphia. The 
extent to which these data can be generalized to the entire population of 
eommissur'itomized patients, and, a fortiori, to normal people, is debatable 
(see Gazzaniga, 1983, and Zaidel, 1983, for somewhat opposite views on this 
fjuest. ion) . 
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Hemispherectomy . The last logical step in this story is to assess the 
number processing ability of a completely isolated right hemisphere, the left 
hemisphere having been removed completely (actually the left cortex, the left 
subcortical structures being almost entirely preserved) 

©ott (1973) described such a patient Who underwent a left hemispherectomy 
&t the age of 10 years, because of malignancy. She had already undergone 
brain surgery at the age of eight for removal of a tumor in the left 
ventricule. When she was tested two years after the hemispherectomy, she 
showed good comprehension of spoken language but very poor verbal expression 
(mainly single words or short stereotyped sentences) and very poor reading of 
3ingle words. She was unade to name a single letter presented* visually or to 
choose above chance (30$ correct) wrilch of four visually displayed letters was 
the one 'just spoken by the experimenter, but she performed much better in this 
task when Arabic numerals were ' used instead o,f letters (80$ correct). When 
Zaidel (1976) tested her one year- later, using a similar procedure, - she was a 
little better in pointing to a spoken multidigit number (cut of six) than to a 
3poken letter. 

■ 

The description given by Hillier (195 J 0 of the performance of his patient 
is more anecdotal. - After. three surgical interventions in the left hemisphere 
during- the preceding 15' months, a complete hemispherectomy was finally 
performed. The patient was 14 years old at the time of the first 
intervention. Each intervention left him with severe aphasic troubles, 
indicating that- language functions were subserved by "his left hemisphere. 
However, after hemispherectomy, he was -described as having good comprehension 
of spoken language and an ability to say some words and to read single 
letters. 

However poor the verbal performance of these |wo patients may appear, it 
is nevertheless much better than would have been expected if the right 
hemisphere were completely unable to subserve any linguistic function. Due to 
the youth of -the patients, no generalization of this conclusion is allowed 
because t*e plasticity of the nervous system is probably still important at 
that age. This plasticity is now well documented in patients who have 
undergone hemi decortication because of infantile hemiplegia, accompanied by 
intractable seizures. It is clear that if the illness starts before the age 
.or one year the -healthy hemisphere, whether right or left*,' subserves all the 
functions normally shared between two hemispheres (McFie, 1961). In these 
oases it requires subtle testing with tasks varying in complexity to show that 
patients retaining their left hemispheres are relatively better at, complex 
Syntax comprehension tnan those retaining their right hemispheres (Dennis, 
r)8wa; tennis h Kohn, 1975), whereas the opposite relation between relative 
level-i of performance holds for complex spatial tasks (Kohn & Dennis, 197*0. 
Men.;.-, behind the tremendous plasticity showed by each hemisphere in 
d*v -loping functions for which it is usually less proficient, there seems to 
m irreducible difference in processing ability as well.<f*"\_ 

At the other extreme, two adult, le't hemi spherectomi es, performed to 
nnnov tumors developed during adulthooo, reveal extremely poo ft verbal 
.,. ility. but not its complete lack £McFie^. 1V61). Between the agec£/one year 
im.J some unknown uoper limit, the brain seems to keep some or its initial 
plasticity, allowing e^en hemisphere to develop abilities for which it is 
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normally not very proficient (McFie, 1961). The two patients just described 
(Gott, 1973;'.HiiUtl*» 1954) were probably still in this phase. . 

A thorough examination of the linguistic abilities of the left and right 
hemispheres has been undertaken by Dennis and her colleagues (Dennis, 1980b; 
Dennis, Lovett, & Wiegel-Crump, 1981; Dennis & Whitaker, 1976) on one ca3e of 
right hemidecortication and t,wo oases of left hemi decortication^ performed 
before the age -of five months. The examinations .published to date took place 
when the children were between 9 and 14 years old. One fascinating finding of 
these studies is that equal performances in decoding i -itten words can be 
mediated by different mental representations. The child retaining his left 
hemisphere shows a good awareness of the phonological structure of language: 
His reading draws on morphophonological properties of English "-thographyr-and 
he reveals a tacit 'knowledge of rules that map writing onto speech when he 
reads new or unfamiliar words. None of these abilitiss is di played £y the 
two children retaining their right hemispheres. Yet with known words/ their 
.reading performance is equivalent tp that of the child retaining his left 
hemisphere. Only .with unknown words does their performance disintegrate; 
this shows that their word knowledge is not based *on a morphophonological 
representation, so that they cannot exploit Ehglish orthographical principles 
to decode new words. These findings are remarkably well in line with tne 
ideas developed by Matt ingly (1972, 1984) concerning [ the relation between 
proficient reading and the availability of morphophonological representations 
of words in the mental lexicon. This author has also 1 stressed that spoken 
languajtS""*" comprehension is probably less dependent on the existence of such 
representations than is reading (Mattingly, 1984). This claim is supported by 
the failure of Dennis and Whitaker (1976) to demonstrate differences in the 
abilities of left and right hemi decorticate children in their ability to deal 
with the. phonemic and semantic aspects of spoken language. We may also note 
that a capacity for - syntactic processing was much greater in the child 
retaining his left hemisphere than in the other two children. This is 
consonant with other data mentioned earlier" (Dennis, 1980a; Dennis & Kohn, 
1975). 

Concl us ions . ■ 

The most important point of thi$ section ^tSk^the contrast between the 
disintegration of calculation and Arabic numeral reading caused by lesions in 
'.ho language areas of the left hemisphere and the relative preservation of 
tnvL. abilities by patients showing a disconnection between intact language 

ir"i.3 arid the visual cortex. 

Tne only point left for discussion i3 the interpretation of the better 
identification of Arabic numerals than of letters by the 11 alexics without 
ignphia of Group 1, chose for whom the presumption of a pure disconnection 
'.r/ndrome is most likely to be correct. All these patients had their language 
futi.-tions located in the left hemisphere, they did not display any known 
.;.«r«»brjl brain disorder before their alexia; and the syndrome was caused by 
brMin lesions in adulthood. Hence, these patiei.-S arcthe b^st suited for the 
uiM.'nsment of the ability of the right hemisphere to 'deal with visual symbols. 

A fir:;t po.Jsibi-i explanation of the better performance with Arabic 
n.im»>r.il5 than with letters is that it .is simply easier to discriminate one 
vi.-.uil symbol out. of 1U possible visual configurations than one out cf 26 





Both sets of symbols have 
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evolved from the need to allow efficient reading. They incontestably succeed 
in doing so, especially, under the temporally unlimited/ viewing conditions 
typical of the neuropsychological examination. , 

# 

Far more likely is that performance is determined by the extent to which 
-the right hemisphere can process the meaning of different stimuli. In - 
essence, this amounts to proposing exactly the safme schema of interpretation 
as that used by Geschwind (1965) to explain the differential ability to name 
objects and colors. Remember that the two basic assumptions are that (a) only 
the left hemisphere can generate a naming , response, and (b) although visual 
information reaching the right hemisphere cannot be transmitted to the left 
hemisphere because the splenial route' ist sectioned, the information can be 
transfered, .once' it has been given a semantic interpretation, through the 
anterior Intact portion of the corpus callosum. Hence, the solution of the 
problem should be sought by analysing the nature of the semantic information t 
conveyed by .letters and Arabic numerals. 



The meaning of a letter is determined by the phonological unit of the 
spoken language to which it refers and by its relation with other similar 
units. In other words, the meaning • of a letter is defined in terms of 
properties that the right hemisphere is unable to process, even when it has 
developed an idiosyncratic language competence,, due to complete loss of the 
left hemisphere (Dennis et ai., 1981). A fortiori, a right hemisphere that 
has never faced the problem of associating sounds to letters should be even 
less able to extract their meaning. This entails that, beyond the 
untransmittable Visual information, there is simply no other form of 
information that can be conveyed to the left hemisphere. In this. vein, the 
very roor performance of alexias .without agraphia in reading and understanding 
phonograph ically written words argues for the hypothesis that word recognition 
Is mediated by letter or syllable (Kana) recognition. 

By contrast, Arabic numerals have a meaning in a symbolic system that has 
nothing to do with phonology. There is therefore no reason why the right 
hemisphere could not generate a semantic representation of the digit and 
transfer it to the othe-* side, a task it seems able to perform with objects as- 
well. This explanation is consonant with the better abilK. of the. right 
hemisphere to interpret Kanji logograms th in Kana phonograms (Sasanuma, I97^b; 
Sugishita et al., 1978). f • . 

In concluding tTh^^Tskction it is worth specifying the exact scope of the 
interpretation of the right hemisphere's better performance with Arabic 
numerals than with letters'. Arabic-numeral reading in alexia without agrapnia 
is not always perfect, and this points to -the fact that, though feasible, the 
ta^k is nevertheless strained. The inefficiency of the procedure is also 
demonstrated by the fact that the reading of multidigit numbers is rarely 
preserved. This w„uld not be the case^ if transmission of the component 
numerals to the lef; hemisphere were more efficient. At present, we do not 
know whether the poor naming prformance is caused by inadequacy of the 
semantic representation generated in the right hemisphere.^ or by the poor 
ability of the corpus caliosum in transmitting interpreted rather than raw 
sensory information, or both. A final point worth emphasizing is that, while 
we may infer from the performance of brain damaged patients that the right 
hemisphere has some ability to process Arabic numerals, we may not infer that 
it. is superior to the left hemisphere in doing so. 
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\ . ( Summary and Conclusions 

Let us take the different points in the reverse order to their 
presentation in the chapter. 

it* ' 

1. With respect to brain injured patients, the fact that (a) number 
processing is strongly, but perhaps not fully, -dependent on the integrity of 
the language areas of the left hemisphere and (b) these areas can be 
disconnected from the visual information reaching the right hemisphere, allows 
us to assess the differential ability of this hemisphere to deal with various 

"^surface forms of numbers and of other types of visual information. Hemifield 
presentation of stimuli is a useful technique in this framework. An 
understanding of how the information is processed will require both the 
general progress . of the analytical- power of cognitive psychology as a whole 
and the comprehension of the basic modes of processing of each hemisphere in 
particular. 

# 

2. As for lateral hemifield preservation of stimuli to normal subjects, 
we may c doubt whether the technique will"' help us to achieve either or both of 
the requirements Just mentioned, at least insofar as one adopts a 
multicomponential view of processing. The analysis of the problem presented 
at the end of the third section is, of course, not the only one possible. 
However, it is baaed on the simplest and most tractable view of processing we 
have, and this casts serious doubt on the ability of the approach to fare 
better in more complex theoretical frameworks. The results show a RVF 
advantage for both logog«*aphic and phonographic number representations. 
Whether this pattern of results should be considered at odds with claims* for a 
LVF advantage in the processing of logograms in general depends on the 
validity of this assertion, which is still controversial. 

N 

3. As regards numerical size comparison Judgments, two of the basic 
effects— symbolic distance and serial posit ion— were found to be independent^ 
of the surface form of the stimuli- in experiments published to date. By 
inference from related data we hypothesized that such will, also prove to be 
the case for a third effect— semantic congruity— for which the information is 
3till lacking. The Stroop-like task leading to a size congruity effect has. 
been judged too complicated to provide useful, nonparadigm-bound information, 
a conclusion that extends to hemifield presentations for the reasons just 
invoked, ""he strategy of research illustrated in this approach could, of 
course, be extended lo cover a variety of questions about the basic knowledge 
•iscciated with sinp'.e-digit numbers. One can, for instance, use the same 
paradigm in co'mp- "1^.. judgments related to the odd vs. even, prime 
vs. nonprime, multiple of two vs. nonmultiple of two questions. The task need, 
not be an explicit comparison be»-ween two numbers? it can also take' the form 
of judging whether a single number possesses the property under investigation. 

M. Three pointc should be made about the discussion of number 
representations. /' 

First, concerning multidigit number processing, it seems appropriate to 
distinguish between Arabic /minerals and number names irrespective of the 
surface form of the latter. The facts that Arabic numerals belong to a 
afferent, more abstract notational system and that they are also intimately 
bound to mathematical activities -make ' them a priori distinct from number 
names. The irrelevance of the s'urfrce form can be further emohasized by 
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predicting that Japanese (or Chinese) aphasics would show the same difficulty 
in transcoding multidigit numbers written in one logographic form into another 
logographic form -(Arabic, numerals into Kanji words and vice, versa) as 
.occidental aphasics have in transcoding these same ; . numbers represented.. 
logographically (Arabic numerals), into alphabetically written number names, 
and vice versa (Deloche & Seroh, 1 962 ; , Seron ; & D^loche, 198H. 

Second, as soon as one focuses ori the processing of single-digit numbers, 
the characteristics '". of % the notational system of which the number 
representations are the elements ceases to play a prominent role, whereas the 
nature 6f« the surface form of the numbers now becomes the important variable. 
We should expect that access to the stored knowledge associated with these 
elements would be influenced by factors affecting the reading of any kind of 
word. From this point of view it iSi therefore, appropriate to regroup the 
symbols into a logographic and a phonographic category, irrespective, of the 
underlying notational system. With normal subjects, we expect the surface 
form of the number to affect the speed with which their conceptual knowledge 
is 'accessed, but we expect the characteristics of this knowledge, as revealed 
by the pattern of interactions between different variables, to be the same 
irrespective of the surface form of the numbers. As far as the available, 
evidence goes, this belief is not yet contradicted (of conclusion 3). • ' 

Third, single-digit number processing by adult brain-injured peoplo "could 
lead to a more ^ complex picture. One should consider two cases. If, on the 
one hand, the' language areas of the left hemisphere are intact, but 
disconnected from the visual cortex (ideal cases of alexia without agraphia or 
LVF presentations with section of the splenium df the corpus callosum), the 
right intact hemisphere could translate the visual, numbers into interpreted 
representations transmissible- to the left hemisphere, provided the numbers are 
represented logographically (one should also allow for the possibility that 
the right hemisphere may le3rn to process the small set of phonographic 
numbers as if they were logograms). In this case, a task could be performed 
according to the normal synergic activity of the hemispheres' of .an intact 
brain, leading to a' performance qualitaU vely equivalent to that of normal 
subjects, save for some eventual loss in efficiency. If, on the other hand, 
the language areas of the left hemisphere are injured and if the, task oan be 
performed at ail, then performance should be at least partially determined by 
right-hemisphere competence in dealing . with numbers. A performance 
Qualitatively different* from that of normal subjects coulJ then be considered 
ah index of the idiosyncrasy of the right hemisphere's knowledge of numbers. 
The nature of the right hemisphere competence could then be studied in 
commissurotomized patients, provided the cognitive capacities of the right 
hemisphere of these severely epileptic people could be considered 
representative of those of normal subjects. 
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A 'NOTE ON PROCESSING KINEMATIC DATA: SAMPLING,' FILTERING, AND DIFFERENTIATION 
B. A. Kay,t K. 0. Munhftll. E, V *-Bateson , H .and J . A. S. Kelsor 



I. Introduction 

Until quite recently, the predominant'' dependent measures used in the 
field of motor behavior have been product scores, such as absolute, constant, 
and variable error,* or discrete temporal measures such as reaction time and 
movement time*. Much ourrent work', on the other hand, has emphasized the de- 
tailed analysis of. the movement itself, , particularly kinematic patterns of 
displacement, velocity, and acceleration over time,. In addition, several' re- 
ceit models of voluntary movement control use kinematic measures as theirSma- 
Jor data base. For example, Nelson (1983) describes a number of constra/nts 
that could govern the "economy" or "efficiency" of skilled movements. Optimi- 
zation criteria such as minimum-force, 'minimum-impulse^ and minimum-energy are 
offered* as possible constraints that can, in principle, be distinguished by. 
subtle difference's in the movement's velocity pattern. Similarly, hogan 
(1984) has proposed "mini mum- jerk" as an organizing principle for voluntary 
movement, a model that requires careful measurement of the. first derivative of 
acceleration ("Jerk") for its evaluation. Finally, several investigators -have 
used kinematic relations (1) to uncover putative invariants of limb (e.g., 
Jeannerod, 1984; Kelso, Southard 1 , & Goodman, "197$; Soechtihg & Lacquaniti, 
1981) and, speech articulator motions (e.g., Tuller & Kelso, 1984) that persist 
'in the faoe of systematic changes in the task; and (2) as a window into a 
movement's dynamic control Structure (e.g., Kelso, V.-Bateson, Saltzman, & 
Kay, 1985; Ostry & Munhall, 19.85). * .' 1 

From the above sampling of movement research, the desirability of obtain- 
ing accurate estimates of kdnematic variables is clear, particularly in light 
of models that demand sensitive measurement procedures for their evaluation. 
Such considerations have led us to re-evaluate very carefully the manner in 
which we process movement data — the subject of the present note. 

/' 

Among the many facjtors affecting accurate kinematic measures are the 
presence of noise in the transducer signals,, 'the conversion of analog signals [ 
to digital form, and digital differentiation, tin this paper we evaluate some 
techniques used to minimize these effects with the aim of assuring ourselves 
that the measurements are, in fact, valid. Wte cojjpentrate % mainly cfn 
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optoelectrdnic movement transduction data 'because o? its common usage in our ■ 
laboratory and elsewhere. -First we outline the processing sequence involved, 
from movement transduction to final analyses. Then we describe in detail the 
characteristics of each of these steps, in particular, those required to 
minimize the effects of noise and those that produce/reliable differentiated 
data.. We also provide evaluations of these two key steps. Our intent here i« . 
not t6 provide a comprehensive survey of physiological signal processing (see 
Kaiser & -Reed, 1 981 Winter, 1979; •Woltring, 1981, for many morl details). 
Rather it is to ensure that our particular measurement and analysis system , 
provides us with accurate ahd iriterpretable data so that quantitative state- 
ments., can be made about the values of, and relations among, kinematic vari- 
ables. This discussioX may also prove useful to others who are beginning to 
use kinematic data extens4vely in their experimental investigations. 

r 

II. The Signal Processing Sequence • > 

In a typical experiment, articulator displacement data are collected us- 
ing a position-sensitive optoelectronic transduction system, similar to tfce 
commonly-used Selcom SELSPOT system. The output signals provide the bases on 
which instantaneous velocity and* acceleration are derived for more In-depth 
kinematic analysis. The general sequence of operations used 1 to obtain move- 
ment data* is as follows: During the experimental session, the analog outputs 
of the transduction system are fed through a bank of amplifiers and recorded 
on FM tape. Later, the FM signals are played. bacK* converted to digital form/ 
and transferred to disk* Next,* the digitized displacement data are low-pass 
filtered (smoothed) to deduce high-frequency noise components. The filtered 
displacement data are then " differentiated using $ central-difference algo- 
rithm. Thes.e 'derived , velocity data a*»e - low-pass filtered and ttfien 
differentiated using the central-difference algorithm, and finally, these de- 
rived acceleration data are low-pass filtered. Table 1 provides a summary^ 
description of the methods we use ^at each of these steps. The filtered dis-. 
placement, velocity, and acceleration waveforms (movement trajectories) are , 
subsequently analyzed with, our Waveform EditiNg and DisplaY program (WENDY; 
Szubowicz, 1977) and many other waveform analysis programs. 1 

III. Movement Transduction, Recording, and Analog-to-Digital Conversion 

The outputs of our optoelectronic movement transduction system 
(Weyrich-Tarbcft model 400; see Kelso et al., 1985, for a description) are 
analog DC signals. Since movement trajectory signals contain significant 
spectral components down to 0 Hz, *we use a frequency-modulation (FM) tape deck 
(SE model 7000) to produce ^an archival record of the experimental session, 
rather than aji amplitude modulation deck, which would not adequately record 
such low frequencies. In order to use the full dynamic^fange of the recorder, 
(signal-to-noise ratio - 48 dB for a vol/tage range of 4 -V), transducer 
signals are routed through a bank of DC amplifiers before they are recorded. 

The analog-to-digital (A/D) step is performed by a Datel model ST-PDPi A/D 
converter, the digitized result being stored on disk. The entire process is 
controlled by the in-house Physiological Signal Processing (PSP) software sys- 
tem (Gulisano, .1982) on a DEC PDP 11/45. Two parameters are of interest here. 
First, the resolution of the converter is twelve bi4;s, that is, it produces 
2 12 or 4096 discrete levels ("machine units") over an input voltage .range of 
+ /- 10 V. The satge bank of DC amplifiers used in the recording step a? lows us 
to exploit the full dynamic range of the converter on playback. Twelve bits 
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Steps performed in the signal processing, sequence 

1.) Movement s transduction and recording : 

— Transducers typioally used:* 

• » 

? optoeiectronio system (position-sensitive) 

- potentiometers 

' - POLGON polarized light goniometers N 

— Reodrder: 

- SE 7000 '16-track FM tape recorder . 
2) Playback and A/D conversion ; . 

— Datel ST-PDP twelve-bit A/D converter 

■ and POP 11/15 with the PSP system 

, • — Sampling rate - 200 Hz, , 
•3) Low-pass filter : j 

— 7-point Bartlet.t (triangular) window J . 



— sample: 


weight: 


t-3 


0 


t-2 


1/9 


t-1 


2/9 


. t 


1/3 


t+1 


2/9 


t+2 


1/9 


t+3 


0 


- tjespdnse: 


4 



» \ 



-cutoff frequency (1/2 amplitude, -6 dB) - 31 Hz 
■- phase shift - constant » 0 « > J 

*0 Differentiation routine : ■ , 

-- .Two-point central-difference algorithm: . 

% ' x'(t) - (x(t+1) - xCt-1))/2h 

(h « sample interval « 5 ms) 

\. — sample: weight: 
/ . L t-1 -1/2h . \ 

». v t+1 • u *l/2h I 

. « 

— Response: • \ 

N*, - within i'0% <?f ideal at 25 H* 

- 3 dB attenuation at H Hz ' \ 

%> - 6 dB attenuation at'60 Hz \ 29^ 

j KK : _ ' - phase shift - constant - 90 degrees 
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of resolution corresponds to a maximum Signal-to-noise ratio of ^096.: 1 or 72 ' ■ . ,1 
dB,"«greatiy exceeding that oj. the FM signal. However, an addhional source of ' 5 

noise enters hefce» since trie resolution . of .eaoh sample is limited to twelve -J: 
bits: The difference between the real' voltage (theoretically oT infinite * *t 
precision) and ttk finite-precision digital representation Us known as quant 1- ^ 
• zation ^jioise, a (factor that poses' a problem for deriving . velocity and , 
acbelera£ion data^ Seoond, the/ rate at which- the signal is sampled must be ; % 
carefully chosen, since the highest frequenoy component one can reliably rep- ' . : 
resent In a sampled signal is 'exactly one^half the sampling rate, the Nyquis.t 
. frequency. The speotral components of any time-rserles ace unknown a priori, . „ : il 
but, according t<p Woltring (1981), a general guideline for reaSonable~sampling *. '4' 
rates is twice the frequency at which noise becomes the dominant charabteris- » 
tic. Figures ta and 1b show the amplitude spectra* of a typical lip movement . '$ 

%equenoe ,( tfrje "utterance /epaepaepe/ ) and a calibration trial (no movement), . ? 
respectively, the latter/ representing, the baseline noise* present in the - sys- 
tem. No divergence from the- noise floor cdn be seen beyond 10 Hz, confirming *• '5 

that our sampling rate of 200 Hz amply satisfies" the above criterion. 

• . •';'>•; 

* ' 'IV. Digital Filtering ' 

• .* . . . A 
As shown in Figure 1, there is a certain amount of noise in the digitized V * 

movement sigrifex, due to quantization and other noise sources in the signal • 
path from trlrnsducer to computer. The .r opt mean square (RMS) . amplitude of the : v 
calibration trial i-s. twelve machine; units*- Indicating that the maximum sig- 
nal- to-noise > ratio is 3'M:1 N (1096:12), or/ 5J; dB, for full . range data. This , 1 
noise affects the accuracy of 'any measurements made on the displacement data. \ > « j 
'Furthermore; ' much of the noise is in the high-frequency range, above any fre- I 
quencies of interest, arid since differentiation acts, as a high-pass' filter \ 
(dpectral components are amplified in direct proportion to their frequencies), " - •-■ 
the dominant characteristic of a doubly-differentiated displacement waveform 
is usually the noise itself. Velbc'fty data derived'from a single differentia- 
tion step are also seriously affected by the amplified high frequency noise. * „ ' 
It is .thus extremely important to. minimize the effects of noise whileat the 
same time leaving the "signal" minimally affected. ' The severity of the , 
high-frequency amplification- introduced by differentiation requires that we 
low-pass filter the raw displacement data when deriving velocity. We use the 
same filter on the resultant velocity data when ieriving acceleration. Since 
' there are still significant high-frequency noise components in ; the observed 
• 'acceleration waveforms, we again use the same filter on the derived accelera- 
tion data. Figure 2 shows parallel examples of uhfiltered (left) and filtered 
(right) displacement, Velocity, and acceleration data, from lower lip^move- 
ments in speech production. As can be seen, the filter removes only the 
high-frequency' components of/ the signals. 

I 

The filter we use is a seven-point Bartlett (triangular) window (Oppen- 
heim & Schafer, 1975), a smoothing algorithm that is very simple in the time, 
domain (as opposed to. filters which use many more points). In additioii to 
simplicity, , it has two further "desirable-- properties. First, high-frequency 
components are greatly reduced while the lower frequencies are relatively * 
unaffected. Second, this window is symmetrical in shape and so introduces no 
phase shift. The computed frequency response of the filter is shown in Figure 
3^; see table 1 for further details. • • . 
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Figure 1. Amplitude spectra (logarithmic; see footnote 2) of a typical re- 
petitive lip movement sequence ( /apaepaspa/ ) (top; 256 points in 
the FFT) and a calibration trial (bottom). 
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Examples of unfiltered (left) and filtered (right) displacement, 
velocity, and acceleration waveforms, reiterant speech. 
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20 



40 

FREQUENCY (&) 



100 



. *7' 



t ' 

Y 



Figure 3* Computed frequency response of the seven-point Bartlett window* 
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Figtfre 4. Waveforms of synthesized data, unfiltered. Clockwise from upper 
J left: no added broadband noise, signal-to-noise ratio * 1:1," 4:1, 

and 32:1. * 
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v To observe the effects of the Bartlett' .filter on' a k rva signal, similar 
to collected data, we filtered simulated data create our digital 

sinewave synthesis package (Rubin, 1JJ82), using a fundar ui 5 Hz and two 

harmonics, „at 10 and 15 te. To this signal we added frhrae yels of broadband 
noise, with resultant signal- to-nois,e ratios of 1*1, 4:1, and 32:1. •''Note that 
the latter ratio is still less than that observed during the. above calibration 
trial. The waveforms are shown in Figure 4, and their respective amplitude 
spectfra in, Figure 5. 'All of these signals were then' filtered using the se--* 
ven-point Bartlett window: the resultant waveforms and their amplitude spec- 
tra are shown in Figures 6 anil 7... Comparing the two spectral plots for the 
noise-free signal, unfll£ered and filtered, It can be seen that tfcj filter 
leaves the peaks or the three sinewave components largely unaffected — the 15 
Hz Jpeak is lowered only 1 'dB. The same relative lack of attenuation of the 
three, main peaks is observed for' the noisy waveforms, while the" noise' content 
at the higher frequencies is lowered .considerably. Visual inspection of the 
smoothed waveforms shows that even, Jji " the/ 1 : 1 signal-to-noise condition the 
overall signal pattern is retained after filtering. The> filter, performs as 
intended across ..the entire frequency range, reducing high frequency noise* 
while leaving low frequency variations largely unaffected. 

\ * 

/ 

V. Digital Differentiation 

With filtered displacement data, reasonable estimates of time derivatives 
can be computed. Many algorithms exist for the digital computation of 
deriva>ives, some more complicated than others. As in the case of filtering, 
we use an 'algorithm which is verv^/simple in the time domain, the two-point 
central difference algorithm (CDA) : **■ / " 



x»(t> - (x(t+1) -. x(t-1))/2h, 



i 



"where x'(t) is the computed derivative at time sample t, x(t+1) the sappled 
value "of the original wavefornfTat t+1 , x(fcH) the sampled value at t-1 , and h 
the sample time interval, in this case, 5 ms. As well as simplicity," this al- 
gorithm possesses desirable accuracy, frequency response, and phase shift 
properties. In an experimental Vcomparison of five derivative algorithms (the 
two-point forward-difference algorithm, the two- and four-point CDA, and five- 
and 'severt-point second-order fits) Marble, Mclntyre, Hastings-James, and Hor 
(1981)- found this algorithm to be the mosfe accurate for twelve-bit data, "in 
the frequency domain, an ideal differentiator amplifies spectral components in 
direct proportion to their frequencies, but since the two-point CDA is" an 
averaging process, i.e., competed over more than one sample value, _ it acts as 
an ideal differentiator in series with a low-pa3S. filter , (Bahi.ll » Ka^&man, & 
Lieberman, 1982). The frequency response of the central-difference algorithm 
is shown in Figure 8 along with that of an ideal differentiator (note the lin- 
ear scale). The response is within 10$ (-.9 dB) *of ideal at 25 Hz and 29* (-3 
dB) at 44. Hz. Thus, the -spectral components of movement, most of which are 
below 44 Hz (see Rozendal,- 1984-j. for a review), are'affected very little. The 
algorithm's attenuation of' the higher frequehcies/is desirable, since noise 
components dominate i.i this range. That ,is/ the^low-pass filtering action of 
the CDA offsets^ the high frequency amplification introduced by differentiation 
to a certain '"extent. , Finally, unlike forward- or backward-«dif ference algo- 
rithms, the CDA is. a purely anti-symmetric function and as such introduces no 
frequency-dependent phase shifts. As a consequence, exact time correspondence 
between sampled displacement, computed velocity, ^nd computed acceleration da- 
ta is obtained. >•.. \ 
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Figure 5* 



uude spectra of the . waveforms In Figure 4. 
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6, Waveforms of synthesized data, filtered with the seven-point 
Bartlett window. Clockwise from upper left; no added broadband 
noise, signal-to A noise*ratio - 1:1, 32:1. 
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8. Frequency response ( linear amplitude scale) of the two-point cen- 
tral difference algorithm (solid line) and the ideal differentiator 
(dotted line). ^ 
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VI. Evaluation of the'Digital 'Processing Sequence * 

^ As an overall evaluation of our processing sequence, we exkmined the 
acceleration characteristics of a set of repetitive movements whose fundamen- 
tal frequency averaged U.65 Hi?.,, a 'rate similar to that observed in reiterant 
speech^asks (Kelso et al., 1985). Both angular displacement and acceleration 
were directly transduced { the former, by a linear potentiometer, the latter by 
an Entran EGC-240-10D aocelerometer), recorded on FM tape, aid salplqd at 200* 
Hz, as discussed ^bove. The aocelerometer »s frequency response was found to 
be flat within .5 dB from 20 to 100 Hz. Derived acceleration ..data were ob- 
tained from the displacement data .via the steps outlined in Section II, using 
the Bartlett filter and the central-difference algorithm. The amplitude spec- 
tra of the directly transduced .and derived acceleration waveforms (see Figure 
9) display strbng similarities up^o the* frequency at which noise predominates 
in the direotl|y transduced signal (approximately 35 Hz). There is some slight 
amplification in the derived signal's spectral components relative to those of 
the directly transduced signal' from about 20 to 30 Hz; in this range, spec- 
tral components are proportionally amplified by the .central-difference algr- . 
rithm, and the filtering effects of both the CDA and .the Bartlett filter are 
small. However j the main ^frequency characteristics of the directly transduced 
signal are closely reflected. in the derived signal, and high-frequency noise 

; is drastically reduced. Moreover, the waveforms themselves are quite similar, 
and a sample-by-samole cross-correlation of the two waveforms revealed a Pear- 
son r of .98. These, results 4 indicate that the sequence of steps we use to 
derive acceleration from displacement produces data closely approximating that 
obtained from direct acceleration. transduction.. 

y> DIRECTLY TRANSBUCED ACCELERATION DERIVED ACCELERATION \/ 




; ^200 mSH 



Figure 9. Acceleration waveforms and amplitude spectra (logarithmic) obtained 
from an aocelerometer ( left) and doubly differentiated displacement 
data -(right )^/"" * 301 



Kay/et al.$ A Note on Processing Kinematic Data 

VII. Summary 

The processing of kinematic data raises a number of problems, for the 
investigator, such as the presenoe of noise in the analog signals, quantiza- 
tion noise, and the characteristics of differentiation itself. Careful selec- 
tion of filtering and differentiation .procedures, plays a crucial role in 
determining the precision of experimental results. / These -Issues received 
attention in the present note in which we document and evaluate the various, 
signal processirtb dteps for treating movement data. To* some extent, an acid 
test of the effectiveness of our' system rests' in the comparison between 
acceleration measurements derived fronud lap la cement, data (in inherently noisy 
process due to double differentiation) and 'the raw signals directly provided 
by an "accelerometer. Ad shown in part VI of -the present note, the mafeeh* be- 
tween these independent measure? of ^he same motion is quite impressive, both 
in v terms of their waveforms and spectral properties. Few direct, comparisons 
of this kind exist in the literature, and not all of .them lare well-matched 
(e.g.VPezzadtf, Norman, & Winter, 197J)." This outcome is assuring, since the 
accuracy of one's measures is a-jnajor factor limiting their interpretation. 
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'The basic movement analysis software package, which has been extended in 
many ways, was developed by P. Rubin, whose help is much appreciated. 

2 A11 amplitude spectra were computed via the Fast Fourier Transform (FFT) 
and plotted using the logarithmic dB scale, 201og(amplitude/2047.5) (except 
Figure 8, which is on a linear scale). A machine-unit value of 2047.5 corre- 
sponds to one-half of the full-scale twelve-bit data range (0 to 4095). Un- 
less otherwise indicated, 1024 sample • poir ' i were used in each of the FFT 
computations. 
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SPONTANEOUS KICKING IN VERY YOUNG INFANTS: EVIDENCE FOR A DYNAMIC BILATERAL 
SYSTEM 



Esther Thelen.t Karl D. Skala,t and J. A. Scott ^elsott 



Abstract . The effects of added mass to one leg on the spontaneous 
kicking movements t of normal six-week-old infants were studied while 
infants kicked both in and out of water. Weighting d^creas.ed the 
kick rate of the weighted leg but increased the rate of the un- 
weighted leg so that an overall rate was preserved. In the dry 
condition, infants maintained the baseline amplitude and velocity „of 
the weighted leg, but showed significant increases in these kinemat- 
ic Variables in the unweighted leg. These effects .were ameliorated 
by submersion. Involuntary infant' kicks exhibit a similiar kinemat- 
ic structure to aduljt voluntary .movements, and like adult movements, 
appear to be organized bilaterally as a single functional unit. 

When people perform skilled actions, their movements are smooth and effi- 
cient and show a high degree of coordination among^the limbs and body seg- 
ments. Analyses of such varied tasks as walking, reaching, posture control, 
and speech suggest that coordination is achieved by recruiting the muscle 
groups involved as a single functional synergy rather than as independent mus- 
cles (Bernstein, 1967; Craik, Herman, & Finley, 1976; Fowler, 1977; 
Grillner, 1975; Kelso, Southard, & Goodman, 1979; Nashner, 1977; Saltzman, 
1979; Shik & Or.iovskii, 1976; Turvey, 19"?). Such functional linkages may 
span several joints or limbs, and even muscle groups quite distant from the 
moving segment may participate (Belen'kii, Gurfinkel, & Pal'tsev, 1967; 
Marsden, Merton, & t Morton, 1983). Recent studies of speech movements, for 
example, show that whenone element in the ensemble is perturbed, adjustments 
are made in the entire functional unit (Abbs & Gracco, 1984; Kelso, Tuller, 
V.-Bateson, & Fowler, 1984). 

4 

What are the developmental origins of coordination and when do such func- 
tional units develop? In stark contrast to adult performance, the movements 
of very young infants appear at first glance to be jerky and uncodrd inated. 
Nonetheless, we report here that a unitary structuring and dynamic organiza- 
tion of movements is present at a very early age. First, we demonstrate that 
in six-week-old infants the motions of the legs are coupled: perturbations of 
the dynamic parameters of one leg are reflected in the apace-time behavior ©f 
the opposite leg. Additionally, we show that although infants of this age 
have little or no voluntary control of their movements (Conel, 1941), the 
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kinematic structure of infant leg kicks* is similar to that of adult voluntary 
movements* ' . 

When awake infanta are placed supine, they perform rapid, cyclic leg 
kicks. This kicking in young infants is a spontaneous behavior; it is seem- 
ingly involuntary and requires no specific eliciting stimuli. As infants be- 
come more behaviorally aroused, they move and kick wore (Thelen; Bradshav, & 
Ward, 1981 ) - Kinematic analyses have shown that Within each limb, movements 
are indeed** not random, but highly coordinated. ^The three joints of an 
individual leg — hip, knee, and ankle — are tlghuly phase- locked and,, move in 
temporal ^'and spatial synchrony (Thelen & Fisher, 1983). Such phase coupling 
between the limbs was hot evident, however. Although at two weeks of &ge, leg 
kicks largely alternate between right and left legs, infants become more 
asymmetrical between one and four months of age and often kick with only one 
leg (Thelen, Ridley-Johnson, & Fisher, 1983). 

Contemporary theories of motor control emphasize tne importance of the 
biodynamic characteristics of the moving system—its mass, stiffness, damping, 
and the equilibrium position* of the muscles, in determining the kinematics of 
the resulting f movement * (Bizzi, Polit, & Moras$o, 1976; Cooke, 1980; 
Fel'dman, 1966; Kelso, 1977; Kugler, Kelso, & Turvey, 1980). The early 
months of infancy are a period of especially rapid growth and a natural conse- 
quence of growth is an increase in leg mass. Thus, we a^ked whether infant 
movements would be sensitive to experimentally-induced changes in mass, and 
especially whether these dynamic alterations * would have consequences for bi- 
lateral coordination. To do this, we added weight to one leg of six-week-old 
infants as they spontaneously kicked in the supine position both in and out of 
water. The resulting rate of kicking in each leg (loaded or unloaded) as 'well 
as the amplitude; velocity, and duration of kicks were examined. Systematic 
effects of this unilateral weighting on b oth legs would indicate that the bi- 
lateral oystera was functioning as a unitary ensemble. Submersion in water, on 
the other hand, should ameliorate the effects of weighting. 

Twelve 6-week-old, normal infants (6 boys and 6 girls) served as sub- 
jects. Each infant was seen for two sessions separated by two or three days; 
one session was conducted with the infants' legs submerged in water. At each 
session, the infants' clothes were removed and ,the joints of their right legs 
marked with tape to facilitate movement analysis. They were placed on a 
lightly padded cot that inclined their torsos at a 25 degree angle and that 
allowed free movement of their limbs. 'During the subversion session the cot 
was placed in a 92 x 32 x 45 cm (32 gal) aquarium filled with sufficient 
pleasahtly warm water to reach the infants' nipple line. Each session (wet 
and dry) commenced with a 3 minute baseline condition where infants kicked 
normally. This was followed by 3/ minutes each of weighting either the right 
or left leg. The order of the wet and dry sessions and the laterality of the 
leg weighting was counterbalanced in the sample* Leg weighting consisted of a 
total of 185 grams of lead shot sewn into two fabric strips fastened with 
Velcro to the thigh and calf. 1 Sessions were videotaped from both a lateral 
and overhead view. Infants' arousal levels were monitored every 16 seconds 
during the taping session using a 6 point scale ranging from (1) asleep to (6) 
highly distressed. '» 

In* an earlier study, the rate of .upright stepping in both legs was re- 
duced in one-month-old infants by bilateral weighting of the legs (Thelen, 
Fisher, & Ridley-Johnson, 1984). We now asked whether the unilateral mass 
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manipulations would affect the relative 'ate of right and left kicks. To do 
this two independent observers viewed the overhead . videotapes and scored" the 
number- of »right and left kicks. 2 , in Table 1, we show tha^ weighting one leg 
decreased the kick rate ' in the weighted leg' and increased the rate in the un- 
weighted leg in both wet and dry conditions. The overall kick rate,'* the sum 
of both right and left kicks over the time interval, did not change, however. 
Generalized arousal was not a factor; there was no significant difference in. 
arousal scores as a function of weighting or water conditions. The effect of 
adding weights was to decrease the kick rate in the weighted leg and produce a 
concomitant increase in the rate of'' the unweighted leg* Submersion did not 
lessen the effect of weighting on k'ick rate. Thus, although the kick^ate of 
individual legs changed systematically with weighting,- the overall* kick rate 
was preserved, strongly suggesting* that bilateral kicking was produced, as a 
single unitary activity^ 

- We ' next asked if the kinematic variables of amplitude, velocity, and 
movement duration (period) were responsive to the unilateral weighting manipu- 
lation. From the ,12 subjects observed, we chose the six with the highest 
overall kick rate for detailed movement analysis. For each of these six 
infants a naive observer selected a 5 sec representative sample of contlnudui 
kicking movements for each condition in each session. This procedure yielded\ 
an average of 7.8 flexion and extension movements per infant in each condiitbn\ 
in the dry session (range* 4-12 movements) and 8.1 in the wet session (rangeJ 
2-18 movements). Movements were analyzed by digitizing the position of tftf 
hip, knee, and ankle Joints of the right leg every 16 ms, i.e.,'* the. videofraje 
sampling rate was 60 Hz. Resulting displacements were filtered for" high^re- 
quency noise using a numerical filter (Winter, ^979) based on a Fouriej^analy- 
sis for dominant frequencies (Thelen & Fisher, 1983). Movemen^BlhplUudes and 
durations were determined from the maxima and minima of the' excursions of the 
knee joint along the x-axis, parallel to the torso of the infant. For each 
discrete movement excursion the peak velocity was identified as the maximunl 
instantaneous velocity associated with that movement. 

Figure 1 illustrates the typical right leg x-axis displacements in the 
unweighted, left -weighted, and right-weighted dry conditions, and In Table 2\'. 
we present the means for the *s ample-; In the dry session, weighting the rUght 
leg resulted in infants' maintaining the baseline amplitude a'nd velocity of 
the weighted leg. When the left leg was weighted, however, the amplitude and 
velocity of the right, unweighted * leg was dramatically increased. The dura- 
tion of the movement excursions remained unaffected. 3 In the water, there was 
no significant effect of weighting on amplitude, and a trend to a decreased 
velocity and increased duration in the right leg when/i't was weighted. 

these experiments revealed remarkable self -organizing properties in the 
motor 3y stems of six-week-old infants. When face'd with an increased load 
perturbation to a single .Ug, infants acted to maintain a c6nsistent overall 
kick rate within the bilateral system and the amp/litude and velocity of the 
weighted leg. To do this, they must Hive sensed the load perturbation and in- 
creased the ne'ural activation levels delivered to the system as a whole. This ' 
system-wide increase was, in turn, manifested as higher amplitude and velocity [ 
in the opposite, unweighted leg. Placing the infants in water substantially . 
ameliorated the effects of the added mass; amplitude and velocity were not 
reliably increased in the unweighted' leg and there was some evidence that wa-< 
ter reduced these variables in the weighted leg, perhaps due to increased 
drag. .Thus, the kinemfctics of the unperturbed leg reflected the dynamic 
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Table 1 

Kick Ra£e and Arousal Scores as a Function of Submersion and 

trilateral Leg Weighting . • 



Leg weight condition 



No weight 



Left 
- weighted 



Right 
weighted , 



Kick rate (mean kicks/rain) 1 



Dry 



Wet 



Right leg 
Left leg 
Total 



Right leg 
Left leg 
Total 



6.25 
7.77 
14.02 



8.17 
7.58 
15.75 



l\.57 
6:32 
16.89 



7 



13.89 
5.96 
19. £5 



5. H1 
10.54 
15.95 



5.37 
10.67 
16.04 



Arousal score a 

Dry 
Wet 



3.77 
3.83 



4.05 
4.33 



4.18 
4.29 



( 



\ 



* Repeated measures ANOVA Using kick rate as the dependent variable and 
water condition, weight condition, and 'kick laterality as independent vari- 
ables .showed no main effects for dry/wet condition, leg weight condition, or 
kick laterality. The interaction between leg weight condition and laterality 
was highly significant, F(2,22) - 21.88, £ < .001. No other interactions were 
significant. 

* 

a Arousal scale: 1 (asleep), 2 (drowsy), 3 (awake, alert,"* few move- 
ments), 4 U..ake, alert, active),. 5 (fussy), 6 (highly distressed). 
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Figure 1. Horizontal (x-axis) excursions of the right leg .of a six-week-old 
infant (dry condition) when the legs were unweighted, and when tira 
left and right legs were weighted. *" * 



manipulations of the other limb.. These results: a're consistent with a fixed 
power source being rationed out, as it were, over the whole kicking system, an 
effect accomplished presumably without cognitive intervention. * , 

* 

Recent theories of motor control have shown that voluntary movements can 
"be* usefully modeled as a dynamical mass spring system whose parameters, stiff- 
ness and equilibrium position, can be set by the nervous- system (Bizzi et a!., 
1976} Cooke, 1 980 ; Fel'dman; 1966; Kelso, 1977; Kugler et ' al. , 1980). In 
several ways, these spontaneous and involuntary infant movements meet predic- 
tions of such a model. 5 For example, as in many adult voluntary movements 
such as speaking and reaching (Cooke v 1980; Jeannerod, 1984; Ostry, Keller, 
& Parush, 1983), we found a strong linear relationship between movement ampli- 
tude and peak velocity -4n infant kicks (Table 2). Fifteen of eighteen corre- 
lation coefficients of the individual amplitude/peak velocity regression lines 
were significant at p_ < -.05 or less in the dry session (all weight conditions) 
and 14 of the 18 possible were significant in the wet sessions (mean Pearson's 
r - .818 in dry and .848 in wet). According to the mass-spring model, the 
slope of the peak velocity-amplitude regression equation is an estimate of 
system stiffness, which can be adjusted by the length-tension relation between 
4 agonist and antagonist muscles (Bizzi et tf al., 1976; Cooke, 1980; Fel'dman, 
1966; Kelso, 1977; Kugler et* al.., 1980). -In table 2, we show that in the 
dry condition, estimated stiffness increased in both weighted conditions over 
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\ / t ' Table 2 

\ Kinematics of Right Leg Movements as a Function of Submersion • 



and Unilateral "L$g Weig 



hting' 



Amplitude Peak Vel. 



(Arb. 

units)? 



(Artf., 
units) 



Amp-vel. s 
corr. 



Amp-vel, 
slope 



Movement 
duration 
(a) 



Dry Condition 


« 






\ 




Right leg- 








' 1.5V 




no weights 


50.95 


1 18.97 


. .840 




.387 


Right leg- * 




* 








left weighted 


85.77 


220.41 


.846 


3.11.. 


.397 


Right leg- 










.360 


c right weighted 


<r 50.07 


no. 15 


» .845 

# 


2.79 


Wet Condition 












Right i$g- 












t no weights 


72.36 

♦ 


170.24 


.881 


2.37 


.438 


Right leg- 












left weighted 


75.90 


186.96 


.740 


' 1 .79 


.410 


Right leg- 










.615 


right weighted 


65.37 

• 


121.08 


.765 


1 .32 


* 













Repeated measures ANOVAs using amplitude, velocity, and movement duration 
as .dependent measures showed k significant main effect of weighting on ampli- 
tude, F(2,1Q) - 4.23, £ < ,05\ and velocity, F(2,10) - 6.72, £ < .bt, but no 
ef feet ' of dry or wet sessions and no interactions. Post-hoc comparisons 
showed significant (£ '< .05) differences in both amplitude and velocity* in the 
dry condition: Left weighted differed significantly from the no weight and 
right weighted conditions, which did not differ significantly from each other. 
In the wet condition, ' there was a significant difference in velocity between 
left and right weighted conditions, as well as between the no weight and right 
weighted conditions. Each of the six infants showed this pattern of results 
in the dry condition: maintenance of the -.amplitude and Velocity in weighted 
leg and increases ijr these variables in unweighted leg. There were no main 
effects on movement duration, but an interaction between dry/wet sessions and 
weight conditions, F_(2,10) - 4.61, £ < .04. 
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the baseline, but this effect was not observed in the wa.ter, presumably due in 
.part to the combined influence of gravity and the fluid medium. 

Although six-week-old infants do not show the tight in-phase or alternat- 
ing coupling of mature. limb movements, each limb does not act independently. 
In particular i the entire bilateral system appears dynamically responsive to 
scalar changes in activation levels. One implication is that self-correcting 
mechanisms are inherent in the neuromuscular system without higher level con- 
trol". This may be one way to preserve movement topographies in growing 
organisms without the, need for conscious recalibration. In addition, these 
results suggest we look at the motor asymmetries of early development in light 
of biodynamic changes. ' If the entire interlimb system is indeed sensitive to 
the biomechanical properties of one element, transients or asymmetric changes 
of mass or muscle tone may shift the lateral distribution of movements.* Per- 
haps most important, however, is the demonstration that even at this early 
age, the kinematic structure of movement (the pea£ velocityaraplitude rela- 
tion) shares a similarity with adult Voluntary- movements that. Involve quite, 
different anatomical structures. This result hints at a basic law of biologi- 
cal motion counon<to developing and developed organisms. 
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".; ' footnotes 

l The amount of mass added to the Aegs- was , determined to simulate /actual 
growth changes in leg dynamics, specifically, the amount of weight gained by 
infants in their legs between 1 and, 2 months of age. Calculations were 1>ased 
on longitudinal anthropometric, measurements of leg cireiifflference from an ear- 
lier study (Thelen. et al. , in -press), and converted to voliime and mass using 
the formulas in Winter U979). . 
*• . . w . 

"Interobserver reliability 'on kick rate averaged over all infants, ses- 
sions, and conditions: r - .97. ' 

• * 1 

* < 

"Presumably, infants maintained constant movement' duration within a vari- 
able overall rate of kicking by adjusting the pauses between movements. Al- 
though pause lengths were not measured directly in this study, we found con- 
sistent movement dilations (app. 300 msO and pause durations inversely propor- 
tional to kick rate In several earlier studies (Thelen et al., 1981/: Thelen & 
Fisher, 1983a, 1983b). , / / 

• • / ' 1 

"Provine (1982) weighted one wing of 3-5 day old chi(?ks and found that 
weighting reduced the frequency of both wings in bilaterally 'synchronous 
drop-evoked flapping. In the supine i.vSition, human infants rarely perform 
bilaterally synchronous leg movements until about 5 months of age (Thelen et 
al., 1983). Further research should ask Whether unilateral weighting in human 
simultaneous kicks would also produca such a "matching" < type of interlimb co- 
ordination. '■ <• 

•Movements according to the underlying dynamics of an undamped 
mass-spring system may bs chiracterlzed by the following equation of motion: 
rait ♦ kAx - 0 

where m - mass, k - stiffness, Ax - ,(x - x p ) with x 0 - rest position; and x 
and X represen* position and acceleration, respectively. Such systems display 
cyclic motions with a) period (T) - 2w/w 0 ; where <o 0 • (k/m)»; and b) ampli- 
tude* (A) which is r determined solely by initial conditions (i.e., initial Ax 
and X) and i*s independent of k or m (and hence period); and^ c) peak velocity 
" *!p A ^ 6ince u o A i a the peak velocity of simple harmonic motion then the 
slope ft « 0 A versus A is u^. Hence, assuming constant mass this slope is 
proportional to (K)«. See Kelso, V-flateson, Saltzman, and Kay (1985). In in- 
fant kicking movements, there were highly significant correlations (r - *801 
in dry; r - .940 in wet) betwee^ mean T (the J„ verse of «,) predicted from 
mean Vp-A relation for each infant in each condition and the actual measured 
duration of motion (T). 

. • > .. . 

•There is scattered evidence for both episodic '(Lampl & Erode, 1983) and 
asymmetric growth patterns (Levy & Levy, 1978; Pande & Singh, 1971 Schultz, 
1926). 
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•ON THE COORDINATION AND REGULATION OF COMPLEX SENSORIMOTOR SYSTEMS* 
E. L. Saltzman . . 



In his foreword to the English edition of N. A. Bernstein's The Coordina- 
tion and- Regulation of Movements (1967), A. R. Luria predicted that readers 
"will find in this book the brilliant essays of a great man of science and, no 
doubt, the ideas contained in these essays win have a maYked influence on the 
future work" (ibid., p. viii) in biology, physiology, ^biomechanics, cybernet- 
ics, mathematics, psychology, and. philosophy. It is unfortunate that this 
seminal collection of articles (translated from Russian and sampling 'Bern- 
stein's scientific output from 1 93^-1 962^) has/ been out of print for the last 
.five years. Now, however,, thanks to H. T. A. Whiting and North-Holland Pubr 
Ushers these papers have be n reprinted In an updated format by including 
with each original chapter a se,t of two companion pieces by current research- 
ers in the area of coordinated sensorimotor activity. Judging from the 
conceptual ferment and interdisciplinary breadth evident in these new [chapters 
(representing, additionally, the more "recent" disciplines of artificial in- 
telligence and. robotics) , it" is clear that Luria's prediction has been amply 
validated. ' Bfernstein's ideas are as alive and vital today as they wefie when 
originally published, and his chapters alone should make this book required 
reading for anyone seriously interested in how actions of multidegree-of-free- 
dom systems are coordinated and controlled. 

\towever, this volume is clearly more than the original Bernstein since, 
with ohly a few exceptions, the companion pieces admirably serve the editor's 
purple of "reassessing" Bernstein's work. The new chapters allow the reader 
to see how the field has developed since Bernstein's day and show not only the 
extent to which his themes have been explored, amplified, and extended, but 
also the extent to which new developments have superceded the old. Some as- 
pects of Bernstein's -work welcome updating. Most notably, his pioneering 
oyclogrammetrlc techniques for analyzing repetitive cyclic motions 3uch as 
locomotion or hammering have been greatly improved upon in the intervening 
years. -Both Woltring's comprehensive chapter on methodology and Rosendal's 
methodology section in- his chapter on locomotion provide the reader with an 
admirable state-of-the-art "synopsis of developments in kinematic data 
acquisition and processing, mass distribution parajneter estimation, and kinet- 
ic modelling" (p. 38). In a more theoretical vein, Bernstein realized that 
ore^ could not assume a simplistic one-to-one "keyboard" mapping between pat- 
terns of efferent neural impulses and resultant articulator movements. Such 



*A review of H^T. A*. Whiting (Ed.), Human moto r actions ; Bernstein reas - 
sessed . New 'York: North-Holland, 1984. Review to appear in Contemporary 
Psychology , in press. 
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' • ' . ■• : • ,'• : /■ '.' > ' 

" functional* _non-un locality " (p. 211} is due' .to the faot that this mapping is 
• sensitive^ to ^he evolving physical context of the movement, e.g., the ongoing 
positions and velocities of the articulators, and the spatial relationship be- . 
tyieen the .articulators and the gravitational field. As pointed out> in the * 
chapters by Boylls and Greene, by Rozendal, and by Hinton, this non-univoeali- - 
ty no. longer poses an insurmountable' problem for theories of coordinated move-' 
merit, and recent biomechanical and robotic control schemes incorporate and j 
, even exploit such indeterrainacies. . . ' \ 

Additionally, several of the companion chapters provide scholarly bip- 
• graphical and historical information regarding Bernstein's scientific career 
(Boylls and Greene; Requin, Somjen, and Bonnet; Pickenhain; Arb-ib) as well 
•as translated excerpts from work previously unavailable in English ' 
(Pickenhain). Thus, for example, we learn that Bernstein won a Soviet Statery 
prize in JW for his monograph On the construction of movements (1947) and 
are disappointed to. find that there is no English translation. , ' * 

, The companion ohapters also provide intriguing glimpses into the 
% subcurrents and crosscurrents that are shaping the field of movement science 
today. A healthy discipline is hot /without its controversies and movement 
science v is no exception. In these chapters we see that movement science's 
health is abundantly apparent, as in the reactions of Requin et al. to the 

\ relatively recent "ecological" approach to action (particularly as Outlined in 

Reed [1982 J) that is itself represented In the chapters by Turvey and Kugler 
and by Reed. One sympathizes with Requin et al. 's objections, since the eco- 
logical approach* is by no means an easy one to live with Intellectually. But 
who ever 'said intellectual life was supposed to be easy, especially if it in- •• 
■volves questioning some of our basic assumptions? And that' is exactly what 
the ecological scientists would have us do concerning the organization of 
sensori motor behavior. ^ For example, synthesizing Bernstein's theoretical' 
views with J. J. Gibson's theory of visual perception (t*g* , Gibson, 1966, 
1979; Michaels & Carello, 1981), Turvey and Kugler argue vigorously against 
taking out the "loan of /intelligence" (p. 381 ) that is necessarily entailed 
when one invokes processes of cognitive mediation to account for the 
"construction" of percepts from sensations, or actions from commands^ Viewed . 
in the context of such challenges to the status quo in the field, these three 
tt* chapters represent, in my opinion, a rather lively subunit of the new volume. 

* However, before leaving the topic of this contrdversy, I feel compelled to 

inject a polite rejoinder to one afcq^ct of Requin et al^'s chapter. They 
allude to a group "hot movement scientists at Haskins Laboratories as 
proselytizing •lexegetes" (p. 471) or "disciples of a prophet" (p. 470) who 
"played a decisive role in the circulation of Bernstein's view", (p. 469). fn 
fact, in the introductory ^chap^er, Boylls and Greene mention a "Bernstein 
♦cult'" (p. xix) with reference to Requin et al.'s chapter. Now? I have been 
working at Haskins Laboratories for the last three years on problems of speech 
and limb coordination, and from my experience these nonsecular metaphors seem 
a bit baroque, It seems more appropriate to despribe the Haskins group as 
critical admirers, of Bernstein, and these do not a cult make. A fan club, 
maybe, but a/ cult? Heaven forbid! 

•-. - 
Finally, several of the. companion chapters highlight developments that, 
in this reviewer's opinion, lie at the cutting edge of the field and that both 
echo and significantly expand upon two of Bernstein's central themes: his 
emphasis on the task or " motor problem " (p. ZW) in identifying functional 
units of action, and his insistence that coordinated biological actions be 
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Regarded as morphological objects... ( that )...d P 'not ' exist as homogenous 
wholes at every moment but- develop in time, . that in t&eir essence they 
Incorporate time coordinates" (p. 178-179, italics added). Regarding the 
former issue, Boylls and Greene highlight the importance of studying movements 
at. the "task level of description" (p. xix) and of conceptualizing coordina- 
tion in terms of task-dynamical functions that define "a family of dynamical 
flows of the sensorimotor, state during the elaboration of a task in time" 
(p. xxii; see also • Saltzman ^ Kelso, 1983, for related discussions of 
task-dynamics)., Similarly, Arbib argues for the importance of Qualitatively 
distinct controller structures in the performance of correspondingly distinct 
tasks like balnea tching and waltzing. But what is most significant in Boylls 
• and Greene's discussion, perhaps, is the incorporation of, current concepts 
from the nonlinear dynamical .systems literature (e.g., "poteWiaL func- 
tions. . .bifurcations and v catastrophes? Cp. xxiv]^ into their theoretical 
fra*ew,ork. ? This- approach,Vin contrast with a\traditional linear oontrol^sys- 
ter account, provides a set of mathematical ano\ theoretical .tools for dealing 
/ wjtn issues of self-organization -and pattern formation in physical and biolog- 
ical systems (e.g., Gierer, 1981; Haken, 1983; Prigogine % Stengers, 1984). 
As discussed in the chapters by Turvey and Kugler and by Arbib, these -ap- 
proaches should allow one to rigorously formulate problems of sensorimotor co- 
ordination, as Bernstein might, have wished, in terms, of ^mo'rphogenetic or 
embryological changes in the structure of the "'biodynamic tissue 1 of live 
movements" (p. 178). For example, Bernstein's notion of time coordinates that 
are incorporated into the essence of an action (see earlier quote) has been 
developed into the concept of an action's Intrinsic timing (e;g.. Fowler. 
1977; Kelso & Tuller, in press). According to this view, a movement's tempo- 
ral structure unfolds as an implicit consequence of its intrinsic task-dynam- 
ics rather than- as the explicit result of durational commands issued by an ex- 
trinsic exfeoutive time-keeper. Significantly, the notion of time as intrindic 
to a system's dynamics appears well grounded in the related physical concept 
of internal time derived in the field of irreversible thermodynamics (e.g., 
Prigoginje, 1984; see also Richardson, 1980, and ' Richardson S Rosen, 1979). 

In- Conclusion, -the present volume lives up to its title and offers a 
successful reassessment and update of Bernstein's classic 1967 publication. 
Taken together, the original chapters and the companion pieces confirm the be- 
lief expressed by Boylls and Greene in their introduction, that Bernstein's 
ideas remain a "renewable resource" (p. xxix) for our attempts to understand 
the coordination, regulation, and growth of cpmplex, sensorimotor systems. 

f t 
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prelinguistic , 

Speech Articulation: 

timing, theory, methodology, vowels, 
coarti culation, Catalan, 
VCV sequences 

Motor Control: 

phase transitions, hand movement, data 
processing, sampling, filtering, 
- differentiation,- kicking, infants, 
coordination, regulation, sensorimotor 
systems, review 

Reading: 

short-term memory, good readers, poor 
readers, beginning readers, deaf readers, 
. cognitive processes,, differential 
processes, memory logographic numbers, 
phonographic numbers, hemispheric effects, 
repetitive naming, word retrieval, 
linguistic ability, spelling proficiency, 
poor spellers, prediction of skills, 
prevention of diificul^y, Kana, Kanji ' 
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